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1 mnhditfltl fllgffggth cigmcgglss afa lolfphi nrfwlillln 

51 tgrissy ta: glmlgligql gisl dqtrvl qnilytasn l lllflglyls 

101 gissla akie kigkpiwrnl npilnrllpi ksip aclavg ilwgwlpcgl 

151 vysaslyalg sgsattggly m lafalgtlp nllaigif sl qlkkimqnry 

201 irlctgl5vs lwalwklavl wl* 

In addition, ORF103ng and ORF 103-1 show 97.3% identity in 222 aa overlap: 

10 20 30 40 50 60 

orf 103-1. pep MNHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRVSSYTAI 

orfl03ng MNHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRISSYTAI 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf 103-1. pep GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 

orfl03ng GLMLGLIGQLGISLDQTRVLQNILYTASNLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf 103-1. pep NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I t I I I : I I I I I I I I I i | | | | | 
orfl03ng NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSATTGGLYMLAFALGTLP 

130 140 150 160 170 180 

190 200 210 220 

orf 103-1 .pep NLLAIGIFSLQLKKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl03ng NLLAIGIFSLQLKKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 

190 200 210 220 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N .meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 47 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 399>: 

1 ATGGAAAACC AAAGGCCGCT CCTAGGCTTT CGCTTGGCAC TTTTGGCGGC 

51 GATGACGTGG GGAACGCTGC CGAT.TCCGT GCGGCAGGTA TTGAAGTTTG 

101 TCGATGCGCC GACGCTGGTG TGGGTGCGTT TTACCGTGGC GGCGGCGGTA 

151 TTGTTTGTTT TGCTGGCACT GGGCGGGCGG CTGCcGAAGC GGCGaGGATT 

201 TTTCTTGGTG CTCATTCAGG CTGCTGCTGC TCGGCGTGGC GGGCATTTCG 

251 GCAAACTTTG TGCTGATTGC CCAAGGGCTG CATTATATTT CGCCGACCAC 

301 GACGCAGGTT TTGTGGCAGA TTTCGCCGTT TACGATGATT GTwGTCGGTG 

351 TGTTGGTGTT TAAAGACCGG ATGACTGCCG CTCAGAAAAT CGGCTTGGTT 

401 TTGCTGCTTG CCGGTTTGCT TATGTATTTT AACGATAAAT TCGGCGAGTT 

4 51 GTCGGGTTTG GGCGCGTATG C.AAGGGCGT GTTGCTGTGT GCGGCAGGCA 

501 GTATGGCATG GGTGTGTAAT GCCGTGGCGC AAAAGCTGCT GTCGGCGCAA 

551 TTCGGGCCGC AACAGATTCT GCTGTTGATT TATGCGGCAA GTGCCGCCGT 

601 GTTCCTGCCG TTTGCCGAAC CC-GCACACAT CGGAAGTATG GACGGTACGT 

651 TGGCGTGGGT ATGTATTGCG TATTGCTGCT TGAATACGTT AATCGGTTAC 

701 GGCTCGTTCG GCGAGGCGTT GAAACATTGG GAGGCTTCCA AAGTCAGCGC 

751 GGTAACAACC TTGCTCCCCG TGTTTACCGT AATAAATACT TTGCTCGGGC 

801 ATTATGTGAT GCCTGAAACT TTTGCCGCGC CGGA. . 

This corresponds to the amino acid sequence <SEQ ID 400; ORF104>: 

1 MENQRPLLGF RLALLAAMTW GTLPXSVRQV LKFVDAPTLV WVRFTVAAAV 

51 LFVLLALGGR LPKRRDFSWC SFRLLLLGVA GISANFVLIA QGLHYISPTT 

101 TQVLWQISPF TMIWGVLVF KDRMTAAQKI GLVLLLAGLL MYFNDKFGEL 

151 SGLGAYXKGV LLCAAGSMAW VCNAVAQKLL SAQFGPQQIL LLIYAASAAV 

201 FLPFAEPAHI GSMDGTLAWV CIAYCCLNTL IGYGSFGEAL KHWEASKVSA 
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251 VTTLLPVFTV INTLLGHYVM PETFAAP. . . 

Further work revealed further partial DNA sequence <SEQ ID 40 1>: 

1 ATGGAAAACC AAAGGCCGCT CCTAGGCTTC GCGTTGGCAC TTTTGGCGGC 

51 GATGACGTGG GGAACGCTGC CGATTGCCGT GCGGCAGGTA TTGAAGTTTG 

101 TCGATGCGCC GACGCTGGTG TGGGTGCGTT TTACCGTGGC GGCGGCGGTA 

151 TTGTTTGTTT TGCTGGCACT GGGCGGGCGG CTGCCGAAGC GGCGGGATTT 

201 TTCTTGGTGC TCATTCAGGC TGCTGCTGCT CGGCGTGGCG GGCATTTCGG 

251 CAAACTTTGT GCTGATTGCC CAAGGGCTGC ATTATATTTC GCCGACCACG 

301 ACGCAGGTTT TGTGGCAGAT TTCGCCGTTT ACGATGATTG TTGTCGGTGT 

351 GTTGGTGTTT AAAGACCGGA TGACTGCCGC TCAGAAAATC GGCTTGGTTT 

401 TGCTGCTTGC CGGTTTGCTT ATGTTTTTTA ACGATAAATT CGGCGAGTTG 

4 51 TCGGGTTTGG GCGCGTATGC GAAGGGCGTG TTGCTGTGTG CGGCAGGCAG 

501 TATGGCATGG GTGTGTTATG CCGTGGCGCA AAAGCTGCTG TCGGCGCAAT 

551 TCGGGCCGCA ACAGATTCTG CTGTTGATTT ATGCGGCAAG TGCCGCCGTG 

601 TTCCTGCCGT TTGCCGAACC GGCACACATC GGAAGTTTGG ACGGTACGTT 

651 GGCGTGGGTT TGTTTTGCGT ATTGCTGCTT GAATACGTTA ATCGGTTACG 

701 GCTCGTTCGG CGAGGCGTTG AAACATTGGG AGGCTTCCAA AGTCAGCGCG 

7 51 GTAACAACCT TGCTCCCCGT GTTTACCGTA ATAwTwwCTT TGCTCGGGCA 

801 TTATGTGATG CCTGAAACTT TTGCCGCGCC GGA. . . 

This corresponds to the amino acid sequence <SEQ ID 402; ORF104-1>: 

1 MENQRPLLGF ALALLAAMT W GTLPIAVRQV LKFVDAPT LV WVRFTVAAAV 

51 LFVLL ALGGR LPKRRDFSWC SFR LLLLGVA GISANFVLIA QGLHYISPTT 

101 TQ VLWQISPF TMIVVGVLV F KDRMT AAQKI GLVLLLAGLL MFF NDKFGEL 

151 SGLGAYAKG V LLCAAGSMAW VCYAVA QKLL SAQFGPQQ IL LLIYAASAAV 

201 FLPFA EPAHI GSLD GTLAWV "CFAYCCLNTL I GYGSFGEAL KHWEASKVSA 

251 VTTLLPVFTV IXXL LGHYVM PETFAAP... 

Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical HI0878 protein of H. influenzae (accession number U32769) 

ORF104 and HI0878 show 40% aa identity in 277aa overlap: 

orfl04 4 QRPLLGFRLALLAAMTWGTLPXSVRQVLKFVDAPTLVWXXXXXXXXXXXXXXXXXXXXP- 62 

Q+PLLGF AL+ AM WG+LP +++QVL ++A T+VW P 
HI0878 3 QQPLLGFTFALITAMAWGSLPIALKQVLSVMNAQTIVWYRFIIAAVSLLALLAYKKQLPE 62 

orfl04 63 --KRRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 120 

K R ++W ++L+GV G+++NF+L + L+YI P+ Q+ +S F M++ GVL+F 
HI0878 63 LMKVRQYAW IMLIGVIGLTSNFLLFSSSLNYIEPSVAQIFIHLSSFGMLICGVLIF 118 

orfl04 121 KDRMTAAQKIXXXXXXXXXXMYFNDKFGELSGLGAYXKGVLLCAAGSMAWVCNAVAQKLL 180 

K+++ QKI ++FND+F +GL Y GV+L G++ WV +AQKL+ 

HI0878 119 KEKLGLHQKIGLFLLLIGLGLFFNDRFDAFAGLNQYSTGVILGVGGALIWVAYGMAQKLM 178 

orfl04 181 SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSMDGTLAWVCIAYCCLNTLIGYGSFGEAL 240 

+F QQILL++Y A F+P A+ + + + LA +C YCCLNTLIGYGS+ EAL 
HI0878 179 LRKFNSQQILLMMYLGCAIAFMPMADFSQVQELT-PLALICFIYCCLNTLIGYGSYAEAL 237 

orfl04 241 KHWEASKVSAVTTLLPVFTVINTLLGHYVMPETFAAP 277 

W+ SKVS V TL+P+FT++ + + HY P FAAP 
HI0878 238 NRWDVSKVSWITLVPLFTILFSHIAHYFSPADFAAP 274 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF104 shows 95.3% identity over a 277aa overlap with an ORF (ORF104a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orfl04 .pep MENQRPLLGFRLALLAAMTWGTLPXSVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 
I M I I I I I I I I I I I I I I I II I I I : I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 
orfl04a MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 

10 20 30 40 50 60 

70 80 90 100 110 120 
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orfl04 .pep LPKRRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 

orfl04a LPKWRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 
70 80 90 100 110 120 



130 140 150 160 170 180 

orf 104 . pep KDRMTAAQKIGLVLLLAGLLMYFNDKFGELSGLGAYXKGVLLCAAGSMAWVCNAVAQKLL 

I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 104a KDRMTAAQKIGLVLLLAGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 

130 140 150 160 170 180 

190 200 210 220 230 240 

or f 104 . pep SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSMDGTLAWVCIAYCCLNTLIGYGSFGEAL 

orf 104a SAQFGPQQILLLIYAASAAVFLPFAELAHIGSLDGTLAWVCFAYCCLNTLIGYGSFGEAL 
190 200 210 220 230 240 

250 260 270 

orf 104 . pep KHWEASKVSAVTTLLPVFTVINTLLGHYVMPETFAAP 

I I I I I I I I I I I I I I I : I I 1 I I I I I : I I I I I 

orf 104a KHWEASKVSAVTTLLPVFTVIFSLLGHYVMPDTFAAPDMNGLGYAGALWVGGAVTAAVG 

250 260 270 280 290 300 

The complete length ORF104a nucleotide sequence <SEQ ID 403> is: 



701 
751 
801 
851 



ATGGAAAACC 
GATGACGTGG 
TCGATGCGCC 
TTGTTTGTTT 
TTCTTGGTGC 
CAAACTTTGT 
ACGCAGGTTT 
GTTGGTGTTT 
TGCTGCTTGC 
TCGGGTTTGG 
TATGGCATGG 
TCGGGCCGCA 
TTCCTGCCGT 
GGCGTGGGTT 
GCTCGTTCGG 
GTAACAACCT 
TTATGTGATG 
ATGCCGGCGC 
GACAGGCTGT 



AAAGGCCGCT 
GGAACGCTGC 
GACGCTGGTG 
TGCTGGCATT 
TCATTCAGGC 
GCTGATTGCC 
TGTGGCAGAT 
AAAGACCGGA 
CGGTTTGCTT 
GCGCGTATGC 
GIGTGTTATG 
ACAGATTCTG 
TTGCCGAACT 
TGTTTTGCGT 
CGAGGCGTTG 
TGCTCCCCGT 
CCTGATACTT 
ACTGGTCGTG 
TCAAACGCCG 



CCTAGGCTTC 
CGATTGCCGT 
TGGGTGCGTT 
GGGCGGGCGG 
TGCTGCTGCT 
CAAGGGCTGC 
TTCGCCGTTT 
TGACTGCCGC 
ATGTTTTTTA 
GAAGGGCGTG 
CCGTGGCGCA 
CTGTTGATTT 
GGCACACATC 
ATTGCTGCTT 
AAACATTGGG 
GTTTACCGTA 
TTGCCGCGCC 
GTCGGGGGTG 
CTAG 



GCGTTGGCAC 
GCGGCAGGTA 
TTACCGTGGC 
CTGCCGAAGT 
CGGCGTGGCG 
ATTATATTTC 
ACGATGATTG 
TCAGAAAATC 
ACGATAAATT 
TTGCTGTGTG 
AAAGCTGCTG 
ATGCGGCAAG 
GGAAGTTTGG 
GAATACGTTA 
AGGCTTCCAA 
ATATTTTCTT 
GGATATGAAC 
CGGTTACGGC 



TTTTGGCGGC 
TTGAAGTTTG 
GGCGGCGGTA 
GGCGGGATTT 
GGCATTTCGG 
GCCGACCACG 
TTGTCGGTGT 
GGCTTGGTTT 
CGGCGAGTTG 
CGGCAGGCAG 
TCGGCGCAAT 
TGCCGCCGTG 
ACGGTACGTT 
ATCGGTTACG 
AGTCAGCGCG 
TGCTCGGGCA 
GGTTTGGGTT 
GGCGGTGGGG 



This encodes a protein having amino acid sequence <SEQ ID 404>: 

1 MENQRPLLGF ALALLAAMT W GTLPIAVRQV LKFVDAPT LV WVRFTVAAAV 

51 LFVLL ALGGR LFKWRDFSWC SFR LLLLGVA GISANFVEIA QGLHYISPTT 

101 TQ VLWQISPF TMIWGVLV F KDRMT AAQKI GLVLLLAGLL MFF NDKFGEL 

151 SGLGAYAKG V LLCAAGSMAW VCYAVA QKLL SAQFGPQQ IL LLIYAASAA V 

201 FLPFA ELAHI GSLD GTLAWV CFAYCCLNTL I GYGSFGEAL KHWEASKVSA 

251 VTTLLPVFTV IFSL LGHYVM PDTFAAPDMN GL GYAGALW VGGAVTAAV G 

301 DRLFKRR* 

ORF104a and ORF104-1 show 98.2% identity in 277 aa overlap: 



MENQRPLLGFALALLAAMTVIGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 
I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I 
MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 



LPKWRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 
LPKRRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 



orf 104a. pep 



130 140 150 160 170 180 

KDRMTAAQKIGLVLLLAGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 
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Homology with a predicted ORF from N. gonorrhoeae 

ORF104 shows 93.9% identity over a 277aa overlap with a predicted ORF (ORF104.ng) from N. 
gonorrhoeae: 

orf 104 .pep MENQRPLLGFRLALLAAMTWGTLPXSVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 60 

I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I II 

orfl04ng MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 60 

orf 104 .pep LPKRRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 120 

III II : I I I I I I I I 

orfl04ng LPKRRDFSWHSFRLLLLGVTGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 120 

orf 104 .pep KDRMTAAQKIGLVLLLAGLLMYFNDKFGELSGLGAYXKGVLLCAAGSMAWVCNAVAQKLL 180 

II II hlllhllllllllll Mill I 

orf 104ng KDRMTAAQKIGLVLLLVGLLMFFNDKFGSLSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 180 

orf 104 .pep SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSMDGTLAWVCIAYCCLNTLIGYGSFGEAL 240 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I :: I I I I ! I I I I I I I I I I I I 
orfl04ng SAQFGPQQILLLIYAASAAVFLLXAEPAHIGSLDGTLAWVCFVYCCLNTLIGYGSFGEAL 240 

orf 104. pep KHWEASKVSAVTTLLPVFTVINTLLGHYVMPETFAAP 277 

I I I I I I I : 11:11111 

orf 104ng KHWEASKVSAVTTLLPVFTVI FSL1GHYVKPDTFAAPDMNGLGYVGALVVVGGAVTAAVG 300 

The complete length ORF104ng nucleotide sequence <SEQ ID 405> is predicted to encode a 
protein having amino acid sequence <SEQ ID 406>: 



101 TQ VLWQISPF TMIWGVLV F KDRMT AAQKI GLVLLLVGLL MFF NDKFGEL 
151 SGLGAYAKG V LLCAAGSMAW VCYAVA QKLL SAQFGPQ QIL LLIYAASAAV 
201 FLLXA EPAHI GSL DGTLAWV CFVYCCLNTL IGYGSFGEAL KHWEAS KVSA 
251 VTTLLPVFTV IFS LLGHYVM PDTFAAPDMN G LGYVGALW VGGAVTAA VG 
301 DRPFKRR* 

Further work revealed the complete gonococcal nucleotide sequence <SEQ ID 407>: 



1 AT GGAAAAC C 

51 GATGACGTGG 

101 TCGATGCGCC 

151 TTGTTTGTTT 

201 TTCTTGGCAT 

251 CAAACTTTGT 

301 ACGCAGGTTT 

351 GTTGGTGTTT 

401 TGCTGCttgT 

451 TCGGGTTTGG 

501 TATGGCCTGG 

551 TCGGGCCGCA 

601 TTCCtgccgT 

651 GGCGTGGGTT 



AAAGGCCGCT 
GGGACGCTGC 
GACGCTGGTG 
TGCTGGCATT 
TCATTCAGGC 
GCTGATTGCC 
TGTGGCAGAT 
AAAGACCGGA 
CGGTttgCTT 
GCGCGTATGC 
GTGTGTTATG 
ACAGATTCTG 
TTGccgaaCC 
TGTTTTGTGT 



CCTAGGCTTC 
CGATTGCCGT 
TGGGTGCGTT 
GGGCGGGCGG 
TGCTGCTGCT 
CAAGGGCTGC 
TTCGCCGTTT 
tgaCTGCCGC 
ATGTTTTtta 

GAAGGGCGTG 
CCGTGGCGCA 
CTGTTGATTT 
GGCACACATC 
ATTGCTGCTT 



GCGTTGGCAC 
GCGGCAGGTA 
TTACCGTGGC 
CTGCCGAAGC 
CGGCGTGACG 
ATTATATTTC 
ACGATGATTG 
GCAGAAAATC 
ACGACAAATT 
TTGCTGTGTG 
AAAGCTGCTG 
ATGCGGcaag 
GGAAGTTTgg 
GAATACGTTA 



TTTTGGCGGC 
TTGAAGTTTG 
GGCGGCGGTA 
GGCGGGATTT 
GGCATTTCGG 
GCCGACCACG 
TTGTCGGCGT 
GGTTTGGTTT 
CGGCGAGTTG 
CGGCAGGCAG 
TCGGCGCAAT 
tgccgccGTG 
aCGGTACGtt 
ATCGGTTACG 
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7 01 GCTCGTTCGG CGAGGCGTTG AAACATTGGG AGGCTTCCAA AGTCAGCGCG 

7 51 GTAACAACCT TGCTCCCCGT GTTTACCGTA ATATTTTCTT TGCTCGGGCA 

801 TTATGTGATG CCTGATACTT TTGCCGCGCC GGATATGAAC GGTTTGGGTT 

851 ATGTCGGCGC ACTGGTCGTG GTCGGGGGTG CGGTTACGGC GGCGGTGGGG 

901 GACAGGCCGT TCAAACGCCG CTAG 

This corresponds to the amino acid sequence <SEQ ID 408; ORF104ng-l>: 

1 MENQRPLLGF ALALLAAMT W GTLPIAVRQV LKFVDAPT LV WVRFTVAAAV 

51 LFVLLALGGR LPKRRDFSWH SFR LLLLGVT GI SANFVLIA QGLHYISPTT 

101 TQ VLWQISPF TMIWGVLV F KDRMT AAQKI GLVLLLVGLL MFF NDKFGEL 

151 SGLGAYAKG V LLCAAGSMAW VCYAVA QKLL SAQFGPQQ IL LLIYAASAAV 

201 FLPFA EPAHI GSLD GTLAWV CFVYCCLNTL I GYGSFGEAL KHWEASK VSA 

251 VTTLLPVFTV IFSL LGHYVM PDTFAAPDMN GL GYVGALW VGGAVTAAV G 

301 DRPFKRR* 

ORF104ng-l and ORF 104-1 show 97.5% identity in 277 aa overlap: 

10 20 30 40 50 60 

orf 104-1. pep MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl04ng-l MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 



orf 104-1. pep LPKRRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 



orf 104-1. pep KDRMTAAQKIGLVLLLAGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 



SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSLDGTLAWVCFAYCCLNTLIGYGSFGEAL 

I I I I I I I I I I I IN II I I I I I : I I I I I I I I Ill 

SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSLDGTLAWVCFVYCCLNTLIGYGSFGEAL 
190 200 210 220 230 240 



orf 104-1. pep KHWEASKVSAVTTLLPVFTVIXXLLGHYVMPETFAAP 



In addition, ORF104ng-l shows significant homology with a hypothetical H.influenzae protein: 

gi I 1573895 (D32769) hypothetical [Haemophilus influenzae] Length = 306 
Score = 237 bits (598), Expect = 8e-62 

Identities = 114/280 (40%), Positives = 158/280 (59%), Gaps = 8/280 (2%) 

QRPXXXXXXXXXXXMTWGTLPIAVRQVLKFVDAPTLVWXXXXXXXXXXXXXXXXXXXXP- 8 8 
Q+P M WG+LPIA++QVL ++A T+VW P 

QQPLLGFTFALITAMAWGSLPIALKQV1SVMNAQTIVWYRFIIAAVSLLALLAYKKQLPE 62 



+FFND+F +GL Y+ GV+L G++ WV Y +AQKL+ 





30 


Sbjct: 


3 


Query: 


89 


Sbjct: 


63 




147 


Sbjct: 


119 


Query: 


207 


Sbjct: 


179 




267 



+F QQILL++Y A F+P A+ - 



LA +C F+YCCLNT L I GYG S + EAL 



Query: 2 67 KHWEASKVSAVTTLLPVFTVIFSLLGHYVMPDTFAAPDMN 306 
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Based on this analysis, including the presence of a putative leader sequence and several putative 
transmembrane domains in the gonococcal protein, it is predicted that the proteins from 
N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 

Example 48 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 409>: 

1 ATGGTAGCTC GTCGGGCTCA TAACCCGAAG GTCGTAGGTT CGAATCCTGT 

51 .CCCGCAACC TAATTTCAAA CCCCTCGGTT CAATGCCGAG GG.GTTTTGT 

101 T.TTGCCTGT TTCCTGTTTC CTGTTTCCTG CCGCCTCCGT TTTTTGCCGG 

151 ATTTTCCTTC CGGCCGCAAT ATCGGAACGG CAGACCGCCG TCTGTTTGCG 

201 GTTGCAAATT CAGGCAGTTT GGCTACAATC TTCCGCATTG TCTTCAAGAA 

251 AGCCAACCAT GCCGACCGTC CGTTTTACCG AATCCGTCAG CAAACAAGAC 

301 CTTGATGCTC TGTTCGAGTG GGCAAAAGCA AGTTACGGTG CAGAAAGTTG 

351 CTGGAAAACG CTGTATCTGA ACGGTCysCC TTTGGGCAAC CTGTCGCCGG 

4 01 AATGGGTGGA ACGCGTsmmA AAAGACTGGG AGGCAGGCTG CyCGGAGTCT 

4 51 TCAGACGGCA TTTTTCTGAA TgCGGACGGc TGgCctGATA TGGgCGGAcg 

501 cTTACAGCAC CTCGCCCTCG GTTGGCACTG TGCGGGGCTG TTGGACGgsT 

551 GGCGCAACGA GTGTTTCGAC CTGACCGACG GCGGCGGCAA CCCCTTGTTC 

601 ACGCTCGaAc GCGCCGyTTT mCGTCCTkTC GGACTGCTCA GCCGCGCCGT 

651 CCATCTCAAC GGTCTGACCG AATCGGACGG CCGATGGCAT TTCTGGATAG 

7 01 GCAGGCGCAG TCCGCACAAA GCAGTCGATC CCAACAAACT CGACAATACT 

751 rCCGCCGGCG GTGTTTCCGG CGGCGAAATG CCGTCTGAAG CCGTGTGTCG 

801 CGAAAGCAGC GAAGAAGCCG GTTTGGATAA AACGCTGcTT CCGCTCATCC 

851 GCCCGGTATC GCAGCTGCAC AGCCTGCGCT CCGTCAGCCG GGGTGTACAC 

901 AATGAAATCC TGTATGTATT CGATGCCGTC CTGCCG . . . 

This corresponds to the amino acid sequence <SEQ ID 410; ORF105>: 



1 MVARRAHNPK VVGSNPXPAT XFQTPRFNAE XVLXLPVSCF LFPAASVFCR 

51 IFLPAAISER QTAVCLRLQI QAVWLQSSAL SSRKPTMPIV RFTESVSKQD 

101 LDALFEWAKA SYGAESCWKT LYLNGXPLGN LSPEWVERVX KDWEAGCXES 

151 SDGIFLNADG WPDMGGRLQH LALGWHCAGL LDGWRNECFD LTDGGGNPLF 

201 TLERAXXRPX GLLSRAVHLN GLTESDGRWH FWIGRRSPHK AVDPNKLDNT 

251 XAGGVSGGEM PSEAVCRESS EEAGLDKTLL PLIRPVSQLH SLRSVSRGVH 

301 NEILYVFDAV LP. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 41 1>: 



1 ATGCCGACCG TCCGTTTTAC CGAATCCGTC AGCAAACAAG ACCTTGATGC 

51 TCTGTTCGAG TGGGCAAAAG CAAGTTACGG TGCAGAAAGT TGCTGGAAAA 

101 CGCTGTATCT GAACGGTCTG CCTTTGGGCA ACCTGTCGCC GGAATGGGTG 

151 GAACGCGTCA AAAAAGACTG GGAGGCAGGC TGCTCGGAGT CTTCAGACGG 

2 01 CATTTTTCTG AATGCGGACG GCTGGCCTGA TATGGGCGGA CGCTTACAGC 

251 ACCTCGCCCT CGGTTGGCAC TGTGCGGGGC TGTTGGACGG CTGGCGCAAC 

301 GAGTGTTTCG ACCTGACCGA CGGCGGCGGC AACCCCTTGT TCACGCTCGA 

351 ACGCGCCGCT TTCCGTCCTT TCGGACTGCT CAGCCGCGCC GTCCATCTCA 

4 01 ACGGTCTGAC CGAATCGGAC GGCCGATGGC ATTTCTGGAT AGGCAGGCGC 

4 51 AGTCCGCACA AAGCAGTCGA TCCCAACAAA CTCGACAATA CTGCCGCCGG 

501 CGGTGTTTCC GGCGGCGAAA TGCCGTCTGA AGCCGTGTGT CGCGAAAGCA 

551 GCGAAGAAGC CGGTTTGGAT AAAACGCTGC TTCCGCTCAT CCGCCCGGTA 

601 TCGCAGCTGC ACAGCCTGCG CTCCGTCAGC CGGGGTGTAC ACAATGAAAT 

651 CCTGTATGTA TTCGATGCCG TCCTGCCCGA AACCTTCCTG CCTGAAAATC 

7 01 AGGATGGCGA AGTGGCGGGT TTTGAGAAAA TGGACATCGG CGGTCTGTTG 

7 51 GATGCCATGT TGTCGGGAAA CATGATGCAC GACGCGCAAC TGGTTACGCT 

801 GGACGCGTTT TGCCGTTACG GTCTGATTGA TGCCGCCCAT CCGCTGTCCG 

851 AGTGGCTGGA CGGCATACGT TTATAG 

This corresponds to the amino acid sequence <SEQ ID 412; ORF105-1>: 
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1 WPTVRFTESV SKQDLDALFE WAKASYGAES CWKTLYLNGL PLGNLSPEWV 

51 ERVKKDWEAG CSESSDGIFL NADGWPDMGG RLQHLALGWH CAGLLDGWRN 

101 ECFDLTDGGG NPLFTLERAA FRPFGLLSRA VHLNGLTESD GRWHFWIGRR 

151 SPHKAVDPNK LDNTAAGGVS GGEMPSEAVC RESSEEAGLD KTLLPLIRPV 

201 SQLHSLRSVS RGVHNEILYV FDAVLPETFL PENQDGEVAG FEKMDIGGLL 

251 DAMLSGNMMH DAQLVTLDAF CRYGLIDAAH PLSEWLDGIR L* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted QRF from N. meningitidis (strain A) 

ORF105 shows 89.4% identity over a 226aa overlap with an ORF (ORF105a) from strain A of N. 
meningitidis: 



orf 105 .pep 
orfl05a 



ISERQTAVCLRLQIQAVWLQSSALSSRKPTMPTVRFTESVSKQDLDALFEWAKASYGAES 
MPTVRFTESVSKHDLDALFEWAKASYGAES 



orfl05.pep 
orfl05a 



CWKTLYLNGXPLGNLSPEWVERVXKDWEAGCXESSDGIFLNADGWPDMGGRLQHLALGWH 
CWKTLYLNGLPLGNLSPEWAERVKKDWEAGCSESSDGIFLNADGWPDMGRRLQHLARIWK 



orf 105. pep 
orfl05a 



CAGLLDGWRNECFDLTDGGGNPLFTLERAXXRPXGLLSRAVHLNGLTESDGRWHFWIGRR 
EAGLLHGWRDECFDLTDGGSNPLFALERAAFRPFGLLSRAVHLNGLVESDGRWHFWIGRR 



orf 105 . pep SPHKAVDPNKLDNTXAGGVSGGEMPSEAVCRESSEEAGLDKTLLPLIRPVSQLHSLRSVS 
I I I I I I I I : I I I I I I I I I I : I I : I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
orf 105a SPHKAVDPDKLDNTAAGGVSSGELPSETVCRESSEEAGLDKTLLPLIRPVSQLHSLRPVS 
160 170 180 190 200 210 



orf 105. pep 
orfl05a 



RGVHNEILYVFDAVLP 



The complete length ORF 105 a nucleotide sequence <SEQ ID 41 3> is: 



1 ATGCCGACCG TCCGTTTTAC CGAATCCGTC AGCAAACACG ACCTTGATGC 

51 CCTATTCGAG TGGGCAAAGG CAAGTTACGG TGCGGAAAGT TGCTGGAAAA 

101 CGCTGTATCT GAACGGTCTG CCTTTGGGCA ATCTGTCGCC GGAATGGGCG 

151 GAGCGCGTCA AAAAAGACTG GGAGGCAGGC TGCTCGGAGT CTTCAGACGG 

201 CATTTTCCTG AATGCGGACG GCTGGCCAGA TATGGGCAGA CGCTTGCAGC 

2 51 ACCTCGCCCG AATATGGAAA GAAGCGGGAC TGCTTCACGG CTGGCGCGAC 

301 GAGTGTTTCG ACCTGACCGA CGGCGGCAGC AATCCCTTGT TCGCGCTCGA 

351 ACGCGCCGCT TTCCGTCCGT TCGGACTGCT CAGCCGCGCC GTCCATCTCA 

4 01 ACGGTTTGGT CGAATCGGAC GGCCGATGGC ATTTCTGGAT AGGCAGGCGC 

4 51 AGTCCGCACA AAGCAGTCGA TCCCGACAAA CTCGACAATA CTGCCGCCGG 

501 CGGTGTTTCC AGCGGTGAAT TGCCGTCTGA AACCGTGTGT CGCGAAAGCA 

551 GCGAAGAAGC CGGTTTGGAT AAAACGCTGC TTCCGCTCAT CCGCCCGGTA 

601 TCGCAGCTGC ACAGCCTGCG CCCCGTCAGC CGGGGTGTGC ACAATGAAAT 

651 CCTGTATGTA TTCGATGCCG TCCTGCCCGA AACCTTCCTG CCTGAAAATC 

7 01 AGGATGGCGA AGTGGCGGGT TTTGAGAAAA TGGACATCGG CGGTCTGTTG 

7 51 GCTGCCATGT TGTCGGGAAA CATGATGCAC GACGCGCAAC TGGTTACGCT 

801 GGACGCGTTT TGCCGTTACG GTCTGATTGA TGCCGCCCAT CCGCTGTCCG 

851 AGTGGCTGGA CGGCATACGT TTATAG 

This encodes a protein having amino acid sequence <SEQ ID 414>: 



1 MPTVRFTESV SKHDLDALFE WAKASYGAES CWKTLYLNGL PLGNLSPEWA 
51 ERVKKDWEAG CSESSDGIFL NADGWPDMGR RLQHLARIWK EAGLLHGWRD 
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ECFDLTDGGS NPLFALERAA FRPFGLLSRA VHLNGLVESD GRWHFWIGRR 

SPHKAVDPDK LDNTAAGGVS SGELPSETVC RESSEEAGLD KTLLPLIRPV 

SQLHSLRPVS RGVHNEILYV FDAVLPETFL PENQDGEVAG FEKMDIGGLL 

AAMLSGNMMH DAQLVTLDAF CRYGLIDAAH PLSEWLDGIR L* 



5 ORF105a and ORF105-1 show 93.8% identity in 291 aa overlap: 



orf 105a . pep 



MPTVRFTESVSKHDLDALFEWAKASYGAESCWKTLYLNGLPLGNLSPEWAERVKKDWEAG 
I ! I I I II I I I I I : I I 1 I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I 
MPTVRFTESVSKQDLDALFEWAKASYGAESCWKTLYLNGLPLGNLSPEWVERVKKDWEAG 



CSESSDGIFLNADGWPDMGRRLQHLARIKKEAGLLHGWRDECFDLTDGGSNPLFALERAA 

CSESSDGIFLNADGWPDMGGRLQHLALGWHCAGLLDGWRNECFDLTDGGGNPLFTLERAA 
70 80 90 100 110 120 



>rf 105a. pep FRPFGLLSRAVHLNGLVESDGRVIHFW I GRRSPHKAVDPDKLDNTAAGGVS SGELPSETVC 
I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I : I I : I II : I I 
>rf 105-1 FRPFGLLSRAVHLNGLTESDGRWHFWIGRRSPHKAVDPNKLDNTAAGGVSGGEMPSEAVC 



orf 105a. pep 
orfl05-l 



FEKMDIGGLLDAMLSC 



Homology with a predicted ORF from N. gonorrhoeae 

ORF105 shows 87.5% identity over a 312aa overlap with a predicted ORF (ORF105.ng) from N. 
gonorrhoeae: 

orf 105 .pep MVARRAHNPKWGSNPXPATXFQTPRFNAEXVLXLPVSCFLFPAASVFCRIFLPAAISER 60 

I I I I I I I I I II I I I I I III : I I I I I I I I II I I I I I I I I I I I I I I I II I I I I 

orfl05ng MVARRAHN PKWG SN PAPATKYQTPRFNAEGVL F FLFPAASVFCRI FLPAAI SER 55 

orf 105 . pep QTAVCLRLQIQAVWLQSSALSSRKPTMPTVRFTESVSKQDLDALFEWAKASYGAESCWKT 120 

orfl05ng QAAVCLRLQIQAVWLQSSALCSRKPAMPTVRFTESVSKQDLDALFERAKASYGAESCWKT 115 

orf 105 . pep LYLNGXPLGNLSPEWVERVXKDWEAGCXESSDGIFLNADGWPDMGGRLQHLALGWHCAGL 180 

I I I I 111111111:11: I I I I I I I I I I : I I I I I I II I 1 I I I I I I I I I I I : III 

Orfl05ng LYLNRLPLGNLSPEWAERIKKDWEAGCSESSMGIFLNADGWPDMGGRLQHLARTWNKAGL 17 5 

orf 105. pep LDGWRNECFDLTDGGGNPLFTLSRAXXRPXGLLSRAVHLNGLTESDGRWHFWIGRRSPHK 24 0 

I Mill Ill : II : II 

orfl05ng LHGWRNECFDLTDGGGNPLFTLSRAAFRPFGLLIRAVHLNGLVESNGRWHFWIGRRSPHK 235 



orf 105ng 



AVDPNKLDNTXAGGVSGGEKPSEAVCRESSEEAGLDKTLLPLIRPVSQLHSLRSVSRGVH 300 

1111:1111 : I I I I I I I I I I I I I I I II I I I I I : I I I I I I I : I I I I I I 

AVDPGKLDNIAGGGVSGGE^PSEAVCRESSEEAGLDKTLFPLIRPVSRLHSLRPVSRGVH 295 



orf 105. pep NEILYVFDAVLP 312 
II I I II I I I I I I 

orfl05ng NEILYVFDAVLPETFLPENQDGEVAGFEKMDIGGLLDAMLSKNMMHDAQLVTLDAFTRYG 355 

A complete length ORF105ng nucleotide sequence <SEQ ID 41 5> was predicted to encode a 
protein having amino acid sequence <SEQ ID 416>: 
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1 MVARRAHNPK VVGSNPAPAT KYQTPRFNAE G VLFFLFPAA SVFCRIFL PA 

51 AISERQAAVC LRLQIQAVWL QSSALCSRKP AMPTVRFTES VSKQDLDALF 

101 ERAKASYGAE SCWKTLYLNR LPLGNLSPEW AERIKKDWEA GCSESSNGIF 

151 LNADGWPDMG GRLQHLARTW NKAGLLHGWR NECFDLTDGG GNPLFTLERA 

201 AFRPFGLLIR AVHLNGLVES NGRWHFWIGR RSPHKAVDPG KLDNIAGGGV 

251 SGGEMPSEAV CRESSEEAGL DKTLFPLIRP VSRLHSLRPV SRGVHNEILY 

301 VFDAVLPETF LPENQDGEVA GFEKMDIGGL LDAMLSKNMM HDAQLVTLDA 

351 FYRYGLIDAA HPLSEWLDGI RL* 

Further work revealed the complete nucleotide sequence <SEQ ID 417>: 



1 ATGCCGACCG TCCGTTTTAC CGAATCCGTC AGCAAACAAG ACCTTGATGC 

51 CCTGTTCGAG CGGGCAAAAG CAAGTTACGG TGCCGAAAGT TGCTGGAAAA 

101 CGCTGTATCT GAACCGTCTT CCTTTGGGCA ATCTGTCGCC GGAATGGGCT 

151 GAGCGCATCA AAAAAGACTG GGAGGCAGGC TGCTCCGAGT CTTCAGACGG 

2 01 CATTTTTCTG AATGCGGACG GCTGGCCGGA TATGGGCGGA CGCTTGCAGC 

251 ACCTCGCCCG CACATGGAAC AAGGCGGGGC TGCTTCACGG ATGGCGCAAC 

301 GAGTGTTTCG ACCTGACCGA CGGCGGCGGC AACCCCTTGT TCACGCTCGA 

351 ACGCGCCGCT TTCCGTCCGT TCGGACTACT CAGCCGCGCC GTCCATCTCA 

4 01 ACGGTTTGGT CGAATCGAAC GGCAGATGGC ATTTTTGGAT AGGCAGGCGC 

451 AGTCCGCACA AAGCAGTCGa tcCCGGCAAG CTCGACAATA TTGCCGGCGG 

501 CGGTGTTTCC GGCGGCGAAA TGCCGTCTGA AGCCGTGTGC CGCGAAAGCA 

551 GCGAAGAAGC CGGTTTGGAT AAAACGCTGT TTCCGCTCAT CCGCCCAGTA 

601 TCGCGGCTGC ACAGCCTTCG CCCCGTCAGC CGAGGTGTGC ACAATGAAAT 

651 CCTGTATGTG TTCGATGCCG TCCTGCCCGA AACCTTCCTG CCTGAAAATC 

7 01 AGGATGGCGA GGTAGCGGGT TTTGAAAAGA TGGACATTGG CGGCCTATTG 

7 51 GATGCCATGT TGTCGAAAAA CATGATGCAC GACGCGCAAC TGGTTACGCT 

801 GGACGCGTTT TACCGTTACG GTCTGATTGA TGCCGCCCAT CCGCTGTCCG 

851 AGTGGCTGGA CGGCATACGT TTATAG 

This corresponds to the amino acid sequence <SEQ ID 418; ORF105ng-l>: 

1 MPTVRFTESV SKQDLDALFE RAKASYGAES CWKTLYLNRL PLGNLSPEWA 

51 ERIKKDWEAG CSESSDGIFL NADGWPDMGG RLQHLARTWN KAGLLHGWRN 

101 ECFDLTDGGG NPLFTLERAA FRPFGLLSRA VHLNGLVESN GRWHFWIGRR 

151 SPHKAVDPGK LDNIAGGGVS GGEMPSEAVC RESSEEAGLD KTLFPLIRPV 

201 SRLHSLRPVS RGVHNEILYV FDAVLPETFL PENQDGEVAG FEKMDIGGLL 

2 51 DAMLSKNMMH DAQLVTLDAF YRYGLIDAAH PLSEWLDGIR L* 

ORG105ng-l and ORF105-1 show 93.5% identity in 291 aa overlap: 

10 20 30 40 50 60 

orf 105-1. pep MPTVRFTESVSKQDLDALFEWAKASYGAESCWKTLYLNGLPLGNLSPEWVERVKKDWEAG 

orfl05ng-l MPTVRFTESVSKQDLDALFERAKASYGAESCWKTLYLNRLPLGNLSPEWAERIKKDWEAG 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf 105-1 .pep CSESSDGIFLNADGWPDMGGRLQHLALGWHCAGLLDGWRNECFDLTDGGGNPLFTLERAA 

orfl05ng-l CSESSDGIFLNADGWPDMGGRLQHLARTWNKAGLLHGWRNECFDLTO 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 105-1 .pep FRPFGLLSRAVHLNGLTESDGRWHFWIGRRSPHKAVDPNKLDNTAAGGVSGGEMPSEAVC 
I I I I I 1 I I I I I I I I I I : I I : I I I I I I I I I I I I I I I I I I : I I I I I : I I I I I I I I I I I I I I 
orfl05ng-l FRPFGLLSRAVHLNGLVESNGRWHFWIGRRSPHKAVDPGKLDNIAGGGVSGGEMPSEAVC 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 105-1 .pep RESSEEAGLDKTLLPLIRPVSQLHSLRSVSRGVHNEILYVFDAVLPETFLPENQDGEVAG 
I I I I I I I I I I I I I : I I I I I I I : I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 
orfl05ng-l RESSEEAGLDKTLFPLIRPV3RLHSLRPVSRGVHNEILYVFDAVLPETFLPENQDGEVAG 

190 200 210 220 230 240 

250 260 270 280 290 

orf 105-1 .pep FEKMDIGGLLDAMLSGNMMHDAQLVTLDAFCRYGLIDAAHPLSEWLDGIRLX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl05ng-l FEKMDIGGLLDAMLSKNMMHDAQLVTLDAFYRYGLIDAAHPLSEWLDGIRLX 

250 260 270 280 290 
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Furthennore, ORF105ng-l shows homology with a yeast enzyme: 

sp|P41888 |TNR3_SCHPO THIAMIN PYROPHOSPHOKINASE (TPK) (THIAMIN KINASE) 
>gi 1 1076928 Ipir | IS52350 thiamin pyrophosphokinase (EC 2.7.6.2) - fission yeast 
(Schizosaccharomyces pombe) >gi 1 666111 (X84417) thiamin pyrophosphokinase 
[Schizosaccharomyces pombe] >gi I 2330852 | gnl I PID I e334056 (Z98533) thiamin 
pyrophosphokinase [Schizosaccharomyces pombe] Length = 569 
Score = 105 bits (259) , Expect = 4e-22 



Ident. 




5 = 64/192 (33%), Positives = 94/192 (48%), Gaps = 3/192 (1%) 




Query: 


268 


NKAGLLHGWRNECFDLTDGGGNPLFTLERAAFRPFGLLSRAVHLNGLVESNGRW— HFWI 


441 




N G+ WRNE + + P+ +ER F FG LS VH + + W+ 




Sbjct: 


96 


NTFGIADQWRNELYTVYGKSKKPVLAVERGGFWLFGFLSTGVHCTMYIPATKEHPLRIWV 


155 


Query: 


442 


GRRSPHKAVDPGKLDNIAGGGVSGGEMPSEAVCRESSEEAGLDKTLFPLIRPVSRLHSLR 


621 






RRSP K P LDN GG++ G+ + +E SEEA LD + LI P + ++ 




Sbjct: 


156 


PRRSPTKQTWPNYLDNSVAGGIAHGDSVIGTMIKEFSEEANLDVSSMNLI-PCGTVSYIK 


214 




622 


PVSRG-VHNEILYVFDAVLPETFLPENQDGEVAGFEKMDIGGLLDAMLSKNMMHDAQLVT 


798 






R + E+ YVFD + + +P DGEVAGF + + +L + K+ + LV 




Sbjct: 


215 


MEKRHWIQPELQYVFDLPVDDLVIPRINDGEVAGFSLLPLNQVLHELELKSFKPNCALVL 


274 




799 


LDAFYRYGLIDAAHP 843 








LD R+G+I HP 




Sbjct: 


275 


LDFLIRHGIITPQHP 289 





Based on this analysis, including the presence of a putative transmembrane domain in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 49 



The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ LD 
41 9>: 



1 ATGAATAGAC CCAAGCAACC CTTCTTCCGT CCCGAAGTCG CCGTTGCCCG 

51 CCAAACCAGC CTGACGGGTA AAGTGATTCT GACACGACCG TTGTCATTTT 

101 CCCTATGGAC GACATTTGCA TCGATATCTG CGTTATTGAT TATCCTGTTT 

151 TTGATATTTG GTAACTATAC GCGAAAGACA ACAGTGGAGG GACAAATTTT 

201 ACCTGCATCG GGCGTAATCA GGGTGTATGC ACCGgATACG rGkACAATTA 

251 CAGCGAAATT CGTGGAAGAT GGms AAAAGG TTAAGGCTGG CGACAAGCTA 

301 TTTGCGCTTT CGACCTCACG TTTCGGCGCA GGAGGTAGCG TGCAGCAGCA 

351 GTTGAAAACG GAGGCAGTTT TGAAGAAAAC GTTGGCAGAA CAGGAACTGG 

4 01 GTCGTCTGAA GCTGATACAC GGGAATGAAA CGCGCAgCcT TAAAGCAACT 

4 51 GTCGAACGTT TGGAAAACCA GGAACTCCAT ATTTCGCAAC AGATAGACGG 

501 TCAGAAAAGG CGCATTAGAC TTGCGGAAGA AATGTTGCAG AAATATCGTT 

551 TCCTATCCGC . CAATGA 

This corresponds to the amino acid sequence <SEQ ID 420; ORF107>: 

1 MNRPKQPFFR PEVAVARQTS LTGKVILTRP LSFSLWTTFA SISALLIILF 

51 LIFGNYTRKT TVEGQILPAS GVIRVYAPDT XTITAKFVED GXKVKAGDKL 

101 FALSTSRFGA GGSVQQQLKT EAVLKKTLAE QELGRLKLIH GNETRSLKAT 

151 VERLENQELH ISQQIDGQKR RIRLAEEMLQ KYRFLSXQ* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF107 shows 97.8% identity over a 1 86aa overlap with an ORF (ORF 107a) from strain A ofN. 
meningitidis: 
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orfl07 .pep MNRPKQPFFRPEVAVARQTSLTGKVILTRPLSFSLWTTFASISALLIILFLIFGNYTRKT 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl07a MNRPKQPFFRPEVAVARQTSLTGKVILTRPLSFSLWTTFASISALLIILFLIFGNYTRKT 
10 20 30 40 50 60 



70 80 90 100 110 120 

orf 107 . pep TVEGQILPASGVIRVYAPDTXTITAKFVEDGXKVKAGDKLFALSTSRFGAGGSVQQQLKT 
I I I II I I I I I I I I I I I I I I I I I I I I I III I I I I I I I II I I I I I I I I I I I I I I I I I I 
orf 107a TVEGQILPASGVIRVYAPDTGTITAKFXEDGEKVKAGDKLFALSTSRFGAGDSVQQQLKT 

70 80 90 100 110 120 



orf 107. pep EAVLKKTLAEQELGRLKLIKGNETRSLKATVERLENQELHISQQIDGQKRRIRLAEEMLQ 
orf 107a EAVLKKTLAEQELGRLKLIHGNETRSLKATVERLENQELHISQQIDGQKRRIRLAEEMLQ 



orf 107 .pep 
orfl07a 



The complete length ORF107a nucleotide sequence <SEQ ID 421> is: 



1 


ATGAATAGAC 


CCAAGCAACC 1 


51 


CCAAACCAGC 


CTGACGGGTA i 


101 


CCCTATGGAC 


GACATTTGCA ' 


151 


TTGATATTTG 


GTAACTATAC ( 


201 


ACCTGCATCG 


GGCGTAATCA ( 


251 


CNGCGAAATT 


CN7GGAAGAT ( 


301 


TTTGCGCTTT 


CGACCTCACG ' 


351 


GTTGAAAACG 


GAGGCAGTTT ' 


401 


GTCGTCTGAA 


GCTGATACAC ( 


451 


GTCGAACGTT 


TGGAAAACCA ( 


501 


TCAGAAAAGG 


CGCATTAGAC ' 


551 


TCCTATCCGC 


CAA7GA7GCA ( 


601 


GCAGAGCTTT 


TAGAGCAGAA i 


651 


AGTCGGGCTG 


CTTCAGGAAA ' 


701 


TCCCCCAAGC 


GGCATGA 


This encodes a 


protein having amino acid 



NTTCTTCCGT 
AAGTGATTCT 
TCGATATCTG 
GCGAAAGACA 
GGGTGTATGC 
GGAGAAAAGG 
TTTCGGCGCA 
TGAAGAAAAC 
GGGAATGAAA 
GGAACTCCAT 
TTGCGGAAGA 
GTGCCAAAAC 
AGCGAAACTT 
TCCGCACGCA 



CCCGAAGTCG 
GACACGACCG 
CGTTATTGAT 
ACAGTGGAGG 
ACCGGATACG 
TTAAGGCTGG 
GGAGATAGCG 
GTTGGCAGAA 
CGCGCAGCCT 
ATTTCGCAAC 
AATGTTGCAG 
AAGAAATGAT 
GATGCCTACC 
GAATCTGACA 



CCGTTGCCCG 
TTGTCATTTT 
TATCCTGTTT 
GACAAATTTT 
GGGACAATTA 
CGACAAGCTA 
TGCAGCAGCA 
CAGGAACTGG 
TAAAGCAACT 
AGATAGACGG 
AAATATCGTT 
GAATGTCAAG 
GCCGAGAAGA 
TTGGNNAGCC 



1 MNRPKQPFFR PE VAVARQT S LTGKVILTRP LSFSLWT TFA SISALLIILF 

51 LIFG NYTRKT TVEGQILPAS GVIRVYAPDT GTITAKFXED GEKVKAGDKL 

101 FALSTSRFGA GDSVQQQLKT EAVLKKTLAE QELGRLKLIH GNETRSLKAT 

151 VERLENQELH ISQQIDGQKR RIRLAEEKLQ KYRFLSANDA VPKQEMMNVK 

201 AELLEQKAKL DAYRREEVGL LQEIRTQNLT LXSLPQAA* 



Homology with a predicted ORF from N.sonorrhoeae 

ORF107 shows 95.7% identity over a 188aa overlap with a predicted ORF (ORF107.ng) from N. 
gonorrhoeae: 



orf 107 .pep 
orfl07ng 
orf 107 .pep 
orf 107ng 
orf 107. pep 
orfl07ng 
orf 107 .pep 
orfl07ng 



MNRPKQPFFRPEVAVARQTSLTGKVILTRPLSFSLWTTFASISALLIILFLIFGNYTRKT 60 

11111111:11 I I I I I I I I I I I I I I I Ill 

MNRPKQPFFRPEVAIARQTSLTGKVILTRPLSFSLWTTFASISALLIILFLIFGNYTRKT 60 

TVEGQILPASGVIRVYAPDTXTITAKFVEDGXKVKAGDKLFALSTSRFGAGGSVQQQLKT 120 

TMEGQILPASGVIRVYAPDTGTITAKFVEDGEKVKAGDKLFALSTSRFGAGGSVQQQLKT 120 

EAVLKKTLAEQELGRLKLIHGNETRSLKATVERLENQELHI SQQI DGQKRRIRLAEEMLQ 180 

Mill I Ill Illlllrlllllllllll II: 

EAVLKKTLAEQELGRLKLIHENETRSLKATVERLENQKLHISQQIDGQKRRIRLAEEMLR 180 



KYRFLSAQ 188 
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The complete length ORF107ng nucleotide sequence <SEQ ID 423> is predicted to encode a 
protein having amino acid sequence <SEQ ID 424>: 

1 MNRPKQPFFR PEVAIARQTS LTGKVILTRP LSFSLWT TFA SISALLIILF 

51 LIFGNYTRKT TMEGQILPAS GVIRVYAPDT GTITAKFVED GEKVKAGDKL 

101 FALSTSRFGA GGSVQQQLKT EAVLKKTLAE QELGRLKLIH ENETRSLKAT 

151 VERLENQKLH I3QQIDGQKR RIRLAEEMLR KYRFLSAQ* 

Based on the presence of a putative ransmembrane domain in the gonococcal protein, it is predicted 
that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful 
antigens for vaccines or diagnostics, or for raising antibodies. 



Example 50 



The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ID 
425>: 



1 ATGCTGAATA CTTTTTTTGC CGTATTGGGC GGCTGCCTGC TGCT . TTGCC 

51 GTGCGGCAAA TCCGTAAATA CGGCGGTACA GCCGCAAAAC GCGGTACAAA 

101 GCGCGCCGAA ACCGGTTTTC AAAGTCATAT ATATCGACAA TACGGCGATT 

151 GCCGGTTTGG ATTTGGGACA AAGCAGCGAA GGCAAAACCA ACGACGGCAA 

201 AAAACAAATC AGTTATCCGA TTAAAGGCTT GCCGGAACAA AATGTTATCC 

251 GACTGATCGG CAAGCATCCC GGCGACTTGG AAGCCGTCAG CGGCAAATGT 

301 ATGGAAACCG ATGATAAGGA CAGTCCGGCA GGTTGGGCAG AAAACGGCGT 

351 GTGCCATACC TTGTTTGCCA AACTGGTGGG CAATATCGCC GAAGACGGCG 

401 GCAAACTGAC GGATTACCTA GTTTCGCATG CCGCCCTGCA ACCCTATCAG 

451 GCAGGCAAAA GCGGCTATGC CGCCGTGCAG AACGGACGCT ATGTGCTGGA 

501 AATCGACAGC GAAGGGGCGT TTTATTTCCG CCGCCGCCAT TATTGA 

This corresponds to the amino acid sequence <SEQ ID 426; ORF108>: 



1 MLNTFFAVLG GCLLXLPCGK SVNTAVQPQN AVQSAPKPVF KVI YIDNTAI 

51 AGLDLGQSSE GKTNDGKKQI SYPIKGLPEQ NVIRLIGKHP GDLEAVSGKC 

101 METDDKDSPA GWAENGVCHT LFAKLVGNIA EDGGKLTDYL VSHAALQPYQ 

151 AGKSGYAAVQ NGRYVLEIDS EGAFYFRRRH Y* 

Further work revealed the following DNA sequence <SEQ ID 427>: 



1 ATGCTGAAAA CATCTTTTGC CGTATTGGGC GGCTGCCTGC TGCTTGCCGC 

51 CTGCGGCAAA TCCGAAAATA CGGCGGAACA GCCGCAAAAC GCGGTACAAA 

101 GCGCGCCGAA ACCGGTTTTC AAAGTCAAAT ATATCGACAA TACGGCGATT 

151 GCCGGTTTGG ATTTGGGACA AAGCAGCGAA GGCAAAACCA ACGACGGCAA 

201 AAAACAAATC AGTTATCCGA TTAAAGGCTT GCCGGAACAA AATGTTATCC 

251 GACTGATCGG CAAGCATCCC GGCGACTTGG AAGCCGTCAG CGGCAAATGT 

301 ATGGAAACCG ATGATAAGGA CAGTCCGGCA GGTTGGGCAG AAAACGGCGT 

351 GTGCCATACC TTGTTTGCCA AACTGGTGGG CAATATCGCC GAAGACGGCG 

401 GCAAACTGAC GGATTACCTA GTTTCGCATG CCGCCCTGCA ACCCTATCAG 

451 GCAGGCAAAA GCGGCTATGC CGCCGTGCAG AACGGACGCT ATGTGCTGGA 

501 AATCGACAGC GAAGGGGCGT TTTATTTCCG CCGCCGCCAT TATTGA 

This corresponds to the amino acid sequence <SEQ ID 428; ORF108-1>: 



1 MLKTSFAVLG GCLLLAA CGK SENTAEQPQN AVQSAPKPVF KVKYIDNTAI 

51 AGLDLGQSSE GKTNDGKKQI SYPIKGLPEQ NVIRLIGKHP GDLEAVSGKC 

101 METDDKDSPA GWAENGVCHT LFAKLVGNIA EDGGKLTDYL VSHAALQPYQ 

151 AGKSGYAAVQ NGRYVLEIDS EGAFYFRRRH Y* 

Computer analysis of this amino acid sequence gave the following results: 
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Homology with a predicted ORF from N. gonorrhoeae 

ORF108 shows 88.4% identity over a 181aa overlap with a predicted ORF (ORF108.ng) from N. 
gonorrhoeae: 

orf 108 . pep MLNTFFAVLGGCLLXLPCGKSVNTAVQPQNAVQSAPKPVFKVIYIDNTAIAGLDLGQSSE 60 
5 II: I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I 

orfl08ng MLKIPFAVLGGCLLLAACGKSENTAEQPQNAAQSAPKPVFKVKYIDNTAIAGLALGQSSE 60 

orf 108 .pep GKTNDGKKQISYPIKGLPEQNVIRLIGKHPGDLEAVSGKCMETDDKDSPAGWAENGVCHT 120 

I I I I I I I I I I I I I I I I I I I I I:: I I 1111:11111 111 I : I : I I I I I I I I I I 

10 orfl08ng GKTNDGKKQISYPIKGLPEQNAVRLTGKHPNDLEAWGKCMETDGKDAPSGWAENGVCHT 120 

orf 108 . pep L FAKLVGN I AE DGGKLT D YLVSHAALQPYQAGKSGYAAVQNGRYVLE I D SEGAFYFRRRHY 181 

I I : I I : I I I I I I II 

^ orfl08ng LFAKLVGNIAEDGGKLTDYLISHSALQPYQAGKSGYAAVQNGRYVLEIDSEGAFYFRRRHY 181 

ORF108-1 shows 92.3% identity with ORF108ng over the same 181 aa overlap: 

orf 108-1 .pep MLKTSFAVLGGCLLLAACGKSENTAEQPQNAVQSAPKPVFKVKYIDNTAIAGLDLGQSSE 60 

orfl08nq-l MLKIPFAVLGGCLLLAACGKSENTAEQPQNAAQSAPKPVFKVKYIDNTAIAGLALGQSSE 60 

20 

orf 108-1 .pep GKTNDGKKQISYPIKGLPEQNVIRLIGKHPGDLEAVSGKCMETDDKDSPAGWAENGVCHT 120 

orfl08ng-l GKTNDGKKQISYPIKGLPEQNAVRLTGKHPNDLEAWGKCMETDGKDAPSGWAENGVCHT 120 

25 orf 108-1. pep LFAKLVGNIAEDGGKLTDYLVSHAALQPYQAGKSGYAAVQNGRYVLEIDSEGAFYFRRRHY 181 

orfl08ng-l LFAKLVGNIAEDGGKLTDYLISHSALQPYQAGKSGYAAVQNGRYVLEIDSEGAFYFRRRHY 181 

The complete length ORF108ng nucleotide sequence <SEQ ID 429> is: 

1 ATGCTGAAAa tacctTTTGC CGTGTtgggc ggCtgcctGC TGCTTGCCGC 

30 51 CTGCGGCAAA TCCGAAAATa cggcggaACA GCCGCAAAAT gcggCACAAA 

101 GCGCGCCGAA ACCGGTTTTC AAAGTCAAAT ACATCGACAA TACGGCGATT 

151 GCCGGTTTGG CTTTGGGACA AAGTAGCGAA GGCAAAACCA acgacgGCAA 

201 AAAACAAATC AGTTATccgA TTAAAGGCTT GCCGGAACAA Aacgccgtcc 

251 gGCTGACCGG AAAGCATCCC AACGACTTGG AagccgtcgT CGGCAAATGT 

35 301 ATGGAAACCG ACGGAAAGGA CGCGCCTTCG GGCTGGGCGG AAAACGGCGT 

351 GTGCCATACC TTGTTTGCCA AACTGGTGGG CAATATCGCC GAAGACGGCG 

401 GCAAACTGAC TGATTACCTG ATTTCGCATT CCGCCCTGCA ACCCTATCAG 

4 51 GCAGGCAAAA GCGGCTATGC CGCCGTGCAG AACGGACGCT ATGTGCTGGA 

501 AATCGACAGC GagggGGCGT TTTATttccg ccgccgccat tattgA 

40 This encodes a protein having amino acid sequence <SEQ ID 430>: 

1 MLKIPFA VLG GCLLLAAC GK SENTAEQPQN AAQSAPKPVF KVKYI DNTAI 

51 AGLAL GOSSE GKT NDGKKQI SYPIKGLPEQ NAVRLTGKHP NDLEAWGKC 

101 METDGKDAPS GWAENGVCHT LFAKLVGNIA EDGGKLTDYL ISHSALQPYQ 

151 AGKSGYAAVQ NGRYVLEIDS EGAFYFRRRH Y* 

45 Based on this analysis, including the presence of a predicted prokaryotic membrane lipoprotein 
lipid attachment site (underlined) and a putative ATP/GTP-binding site motif A (P-loop, double- 
underlined) in the gonococcal protein, it is predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



50 Example 51 



The following DNA sequence was identified in N. meningitidis <SEQ ID 43 1>: 



WO 99/24578 



-262- 



PCT/IB98/01665 



1 ATGGAAGATT TATATATAAT ACTCGCTTTG GGTTTGGTTG CGATGATTGC 

51 CGgATTTATC GATgcgatTg cGggCGGGGG TGGTTTGATT ACGCTGCCCG 

101 CACTCTTGTT GGCAGGTATT CCTCCCGTGT CGGCAATTGC CACCAACAAG 

151 CTGCAAgCAG CCGCTGCTAC GTTTTCAGCT ACGGTTTCTT TTGCACGCAA 

2 01 AGGTTTGATT GATTGGAAGA AAGGTCTCCC GATTGCCGCA GCATCGTTTG 

251 TAGGCGGCGT GGcCGGTGCA TTATCGGTCA GCTTGGTTTC CAAAGATATT 

301 CTgCTgGCGG TCGTGCCGGT TTTGTTGATA TTTGTCGCAC TGTATTTTGT 

351 GTTTTCGCCC AAGCTCGACG GCAGTAAGGA AGGCAAAGCC AGAATGTCTT 

401 TTTTTCTGTT cGGGCTGACG GTCGC.ACCG CTTTTGGGTT TTTACGACGG 

4 51 TGTGTTCGGA CCGGGTGTCG GCTCGTTTTT TCTGATTGCC TTTATTGTTT 

501 TGCTCGGCTG CAAgCTGTTG AACGCGATGT CTTACACCAA ATTGGCGAAC 

551 GTTGCCTGCA ATCTTGGTTC GCTATCGGTA TTCCTGCTGC ACGGTTCGAT 

601 TATTTTCCCG ATTGCGGCAA CGaTGGCGGT CGGTGCGTTT GTCGGtGCGA 

651 ATTTAgGTGC GAGATTTGCC GTaCgctTCG GTTCGAAGCT GATTAA 

This corresponds to the amino acid sequence <SEQ ID 432; ORF109>: 

1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIATNK 

51 LQAAAAT FS A TVSFARKGLI DWKKGLPIAA AS FVGGVAGA LSVSLVSKDI 

101 LLAWPVLLI FVALYFVFS P KLDGSKEGKA RMSFFLFGLT VXTAFGFLRR 

151 CVRTGCRLVF SDCLYCFARL QAVSRDVLHQ IGERCLQSWF AIGIPAARFD 

201 YFPDCGNDGG RCVCRCEFRC EICRTLRFEA D* 

Further work revealed the following DNA sequence <SEQ ID 433>: 



1 ATGGAAGATT TATATATAAT ACTCGCTTTG GGTTTGGTTG CGATGATTGC 

51 CGGATTTATC GATGCGATTG CGGGCGGGGG TGGTTTGATT ACGCTGCCCG 

101 CACTCTTGTT GGCAGGTATT CCTCCCGTGT CGGCAATTGC CACCAACAAG 

151 CTGCAAGCAG CCGCTGCTAC GTTTTCAGCT ACGGTTTCTT TTGCACGCAA 

201 AGGTTTGATT GATTGGAAGA AAGGTCTCCC GATTGCCGCA GCATCGTTTG 

251 TAGGCGGCGT GGCCGGTGCA TTATCGGTCA GCTTGGTTTC CAAAGATATT 

301 CTGCTGGCGG TCGTGCCGGT TTTGTTGATA TTTGTCGCAC TGTATTTTGT 

351 GTTTTCGCCC AAGCTCGACG GCAGTAAGGA AGGCAAAGCC AGAATGTCTT 

4 01 TTTTTCTGTT CGGGCTGACG GTCGCACCGC TTTTGGGTTT TTACGACGGT 

4 51 GTGTTCGGAC CGGGTGTCGG CTCGTTTTTT CTGATTGCCT TTATTGTTTT 

501 GCTCGGCTGC AAGCTGTTGA ACGCGATGTC TTACACCAAA TTGGCGAACG 

551 TTGCCTGCAA TCTTGGTTCG CTATCGGTAT TCCTGCTGCA CGGTTCGATT 

601 ATTTTCCCGA TTGCGGCAAC GATGGCGGTC GGTGCGTTTG TCGGTGCGAA 

651 TTTAGGTGCG AGATTTGCCG TCCGCTTCGG TTCGAAGCTG ATTAAGCCGC 

7 01 TGCTGATTGT CATCAGCATT TCGATGGCTG TGAAATTGTT GATAGACGAG 

7 51 AGAAATCCGC TGTATCAGAT GATTGTTTCG ATGTTTTAA 

This corresponds to the amino acid sequence <SEQ ID 434; ORF109-1>: 



1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIA TNK 

51 LQAAAAT FS A TVSFARKGLI DWKKGLPI AA AS FVGGVAGA LSVSLV SKDI 

101 LLAWPVLLI FVALYF VFS P KLDGSKEGKA R MSFFLFGLT VAPLLGFY DG 

151 VFGPG VGSFF LIAFIVLLGC KL LNAMSYTK LANVACNLGS LSVFLLHGSI 

201 IFPIAATMAV GAFVGA NLGA RFAVRFGSKL IK PLLIVISI SMAVKLLID E 

251 RNPLYQMIVS MF* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF109 shows 95.9% identity over a 147aa overlap with an ORF (ORF109a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orfl09.pep MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 

orfl09a MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 
10 20 30 40 50 60 



70 80 90 100 110 120 

orfl09.pep TVS FARKGLIDWKKGLPIAAASFVGGVAGALSVSLVSKDILLAWPVLLI FVALYFVFS P 
I I I I I I I I I I I I 1 I I I I I I I I I I : I I I : I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I 
orfl09a TVSFARKGLI DWKKGLPIAAASFAGGWGALSVSLVSKDILLAWPVLLI FVALYFVFS P 

70 80 90 100 110 120 
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>rfl09.pep KLDGSKEGKARMSFFLFGLTVXTAFGFLRRCVRTGCRLVFSDCLYCFARLQAVERDVLHQ 



The complete length ORF109a nucleotide sequence <SEQ ID 435> is: 



ATGGAAGATT 
CGGATTTATC 
CACTCTTGTT 
CTGCAAGCAG 
AGGTTTGATT 
CAGGCGGCGT 
CTGCTGGCGG 
GTTTTCGCCC 
TTTTTCTGTT 
GTGTTCGGAC 
GCTCGGCTGC 
TTGCCTGCAA 
ATTTTCCCGA 
TTTAGGTGCG 
TGCTGATTGT 
AGAAATCCGC 



TATACATAAT 
GATGCGATTG 
GGCAGGTATT 
CCGCTGCTAC 
GATTGGAAGA 
GGTCGGTGCA 
TCGTGCCGGT 
AAGCTCGACG 
CGGTCTGACG 
CGGGTGTCGG 
AAGCTGTTGA 
TCTTGGTTCG 
TTGCGGCAAC 
AGATTTGCCG 
CATCAGCATT 
TGTATCAGAT 



ACTCGCTTTG 
CGGGTGGGGG 
CCTCCCGTGT 
GTTTTCGGCT 
AAGGTCTCCC 
TTATCGGTCA 
TTTGTTGATA 
GCAGTAAGGA 
GTTGCACCAC 
CTCGTTTTTT 
ACGCGATGTC 
CTATCGGTAT 
GATGGCGGTC 
TCCGCTTCGG 
TCGATGGCTG 
GATTGTTTCG 



GGTTTGGTTG 
TGGTTTGATT 
CGGCAATTGC 
ACGGTTTCTT 
GATTGCGGCA 
GCTTGGTTTC 
TTTGTCGCGC 
AGGCAAAGCC 
TTTTGGGTTT 
CTGATTGCCT 
TTACACCAAA 
TCCTGCTGCA 
GGTGCGTTTG 
TTCGAAGCTG 
TGAAATTGTT 
ATGTTTTAA 



CGATGATTGC 
ACGCTGCCTG 
CACCAACAAG 
TTGCACGCAA 
GCATCGTTTG 
CAAAGATATT 
TGTATTTTGT 
AGAATGTCTT 
TTACGACGGT 
TTATTGTTTT 
TTGGCGAACG 
CGGTTCGATT 
TCGGTGCGAA 
ATTAAGCCGC 
GATAGACGAG 



This encodes a protein having amino acid sequence <SEQ ID 436>: 

1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIA TNK 

51 LQAAAATFSA TVSFARKGLI DWKKGhPI AA ASFAGGVVGA LSVSLV SKDI 

101 LLAVVPVLLI FVALYFV FSP KLDGSKEGKA R MSFFLFGLT VAPLLGFY DG 

151 VFGPG VGSFF LIAFIVLLGC KL LNAMSYTK LANVACNLGS LSVFLLHGSI 

201 IFPIAATMAV GAFVGA NLGA RFAVRFGSKL 1K PLLIVISI SMAVKLLID E 

251 RNPLYQMIVS MF* 

ORF109a and ORF109-1 show 99.2% identity in 262 aa overlap: 



orfl09a.pep 
orfl09-l 



MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 
I I I I I I I I I I I I I I I I I I I I I ! I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 



orfl09a.pep 
orfl09-l 



TVS FARKGLI DWKKGLP IAAAS FAGGVVGALS VSLVSKDI LLAWPVLL I FVALYFVFSP 
TVS FARKGLIDWKKGLPIAAASFVGGVAGALSVSLVSKDILLAWPVLL I FVALYFVFSP 



orf 109a. pep 
orfl09-l 



KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I 1 I I I I I I I I I I I I I I I I I I I I 
KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 
130 140 150 160 170 1B0 



orfl09a.pep 
orfl09-l 



LANVACNLGSLSVFLLKGSIIFPIAATMAVGAFVGANLGARFAVRFGSKLIKPLLIVISI 



orf 109a. pep 
orfl09-l 



SMAVKLLIDERNPLYQMIVSMFX 
I I I I I I I I I I I I I I I I I I I I I I I 
SMAVKLLIDERNPLYQMIVSMFX 
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Homology with a predicted QRF from N. gonorrhoeae 

ORF109 shows 98.3% identity over a 231aa overlap with a predicted ORF (ORF109.ng) from N. 
gonorrhoeae: 

orf 109 .pep MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 60 

orf 109ng MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 60 

orf 109. pep TVSFARKGLIDWKKGLPIAAASFVGGVAGALSVSLVSKDILLAVVPVLLIFVALYFVFSP 120 

I I I I I : I I I : I I I I I I I I I I I I 

orfl09ng TVSFARKGLIDWKKGLPIAAASFAGGWGALSVSLVSKDILLAWPVLLIFVALYFVFSP 120 

orf 109. pep KLDGSKEGKARMSFFLFGLTVXTAFGFLRRCVRTGCRLVFSDCLYCFARLQAVERDVLHQ 180 

orf 109ng KLDGSKEGKARMSFFLFGLTVATAFGFLRRCVRTGCRLVFSDCLYCFARLQAVERDVLHQ 180 

orf 109. pep IGERCLQSWFAIGIPAARFDYFPDCGNDGGRCVCRCEFRCEICRTLRFEAD 231 

orfl09ng IGERCLQSWFAIGIPAARFDYFPDCGNDGGRCVCRCEFRCEICRPLRFEAD 231 

An ORF109ng nucleotide sequence <SEQ ID 437> was predicted to encode a protein having amino 
acid sequence <SEQ ID 43 8>: 

1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIATNK 

51 LQAAAATFSA TVSFARKGLI DWKKGLPIA A ASFAGGWGA LSVSLV SKDI 

101 LLAWPVLLI FVALYF VFSP KLDGSKEGKA R MSFFLFGLT VATAFGFL RR 

151 CVRTGCRLVF SDCLYCFARL QAVERDVLHQ IGERCLQSWF AIGIPAARFD 

201 YFPDCGNDGG RCVCRCEFRC EICRPLRFEA D* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 439>: 



1 ATGGAAGATT TATACATAAT ACTCGCTTTG GGTTTGGTTG CGATGATCGC 

51 CGGATTTATC GATGCGATTG CGGGCGGGGG TGGTTTGATT ACGCTGCCTG 

101 CACTCTTGTT GGCAGGTATT CCTCCCGTGT CGGCAATTGC CACCAACAAG 

151 CTGCAAGCAG CCGCTGCTAC GTTTTCGGCT ACGGTTTCTT TTGCACGCAA 

201 AGGTTTGATT GATTGGAAGA AAGGTCTCCC GATTGCCGCA GCATCGTTTG 

251 CAGGCGGCGT GGTCGGTGCA TTATCGGTCA GCTTGGTTTC CAAAGATATT 

301 TTGCTGGCGG TCGTGCCGGT TTTGTTGATA TTTGTCGCGC TGTATTTTGT 

351 GTTTTCGCCC AAGCTCGACG GCAGTAAGGA AGGCAAAGCC AGAATGTCTT 

401 TTTTTCTATT CGGGCTGACG GTTGCACCGC TTTTGGGTTT TTACGACGGT 

451 GTGTTCGGAC CGGGTGTCGG CTCGTTTTTT CTGATTGCCT TTATTGTTTT 

501 GCTCGGCTGC AAGCTGTTGA ACGCGATGTC TTACACCAAA TTGGCGAACG 

551 TTGCTTGCAA TCTTGGTTCG CTATCGGTAT TCCTGCTGCA CGGTTCGATT 

601 ATTTTCCCGA TTGTGGCAAC GATGGCGGTC GGTGCGTTTG TCGGTGCGAA 

651 TTTAGGTGCG AGATTTGCCG TCCGCTTCGG TTCGAAGCTG ATTAAGCCGC 

7 01 TGCTGATTGT CATCAGCATT TCGATGGCTG TGAAATTGTT GATAGACGAG 

751 AGAAATCCGC TGTATCAGAT GATTGTTTCG ATGTTTTAA 

This corresponds to the amino acid sequence <SEQ ID 440; ORF109ng-l>: 

1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIA TNK 

51 LQAAAATFSA TVSFARKGLI DWKKGLPI AA ASFAGGWGA LSVSLV SKDI 

101 LLAWPVLLI FVALYF VFSP KLDGSKEGKA R MSFFLFGLT VAPLLGFY DG 

151 VFGPG VGSFF LIAFIVLLGC KL LNAMSYTK LANVACNLGS LSVFLLHGSI 

2 01 IFPIVATMAV GAFVGA N1GA RFAVRFGSKL IK PLLIVISI SMAVKLLID E 

251 RNPLYQMIVS MF* 

ORF109ng-l and ORF 109-1 show 98.9% identity in 262 aa overlap: 



10 20 30 40 50 60 

orf 109ng-l . pep MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 

orf 10 9-1 MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 
10 20 30 40 50 60 



70 80 90 100 110 120 

orf 10 9ng-l.pep TVSFARKGLI DWKKGLPIAAASFAGGWGALSVSLVSKDI LLAWPVLLI FVALYFVFSP 
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orfl0 9-l TVSFARKGLIDWKKGLPIAAASFVGGVAGALSVSLVSKDILLAWPVLLIFVALYFVFSP 
70 80 90 100 110 120 

130 140 150 160 170 180 

orfl09ng-l.pep KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl0 9-l KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 

130 140 150 160 170 180 



190 200 210 220 230 240 

orfl09ng-l.pep LANVACNLGSLS VFLLHGS I I FPI VATMAVGAFVGANLGARFAVRFGSKLIKPLLIVI S I 

or f 10 9-1 LANVACNLGSLSVFLLHGSIIFPIAATMAVGAFVGANLGARFAVRFGSKLIKPLLIVISI 
190 200 210 220 230 240 

250 260 
orf 109ng-l .pep SMAVKLL I DERN PLYQMI VSMFX 
I I I I I I I I I I I I I I I I I I I I I I I 
orf 10 9-1 SMAVKLLIDERNPLYQMI VSMFX 

250 260 

In addition, ORF109ng-l shows homology to a hypothetical Pseudomonas protein: 

sp|P29942|YCB9_PSEDE HYPOTHETICAL 27.4 KD PROTEIN IN COBO 3'REGION (ORF9) 
>gi I 94984 Ipir | 1 138164 hypothetical protein 9 - Pseudomonas sp >gi|551929 
(M62866) ORF9 [Pseudomonas denitrif icans ] Length = 261 
Score = 175 bits (439), Expect = 3e-43 

Identities = 83/214 (38%), Positives = 131/214 (60%), Gaps = 1/214 (0%) 

Query: 41 PPVSAIATNKLQXXXXXXXXXXXXXRKGLIDWKKGLPIXXXXXXXXXXXXXXXXXXXKDI 100 

PP+ + TNKLQ R+G ++ K+ LP+ D+ 

Sbjct: 43 PPLQTLGTNKLQGLFGSGSATLSYARRGHVNLKEQLPMALMSAAGAVLGALLATIVPGDV 102 

Query: 101 LLAWPVLLI FVALYFVFS PKLDGSKEGKARMS FFLFGLTVAPLLGFYDGVFG PGVGS FF 160 

L A++P LLI +ALYF P + G + +R++ F+F LT+ PL+GFYDGVFGPG GSFF 
Sbjct: 103 LKAILPFLLIAIALYFGLKPNM-GDVDQHSRVTPFVFTLTLVPLIGFYDGVFGPGTGSFF 161 

Query: 161 LIAFIVLLGCKLLNAMSYTKLANVACNLGSLSVFLLHGSIIFPIVATMAVGAFVGANLGA 220 

++ F+ L G +L A ++TK N N+G+ VFL G++++ + M +G F+GA +G+ 
Sbjct: 162 MLGFVTLAGFGVLKATAHTKFLNFGSNVGAFGVFLFFGAVLWKVGLLMGLGQFLGAQVGS 221 

Query: 221 RFAVRFGSKLIKPLLIVISISMAVKLLIDERNPL 254 

R+A+ G+K+IKPLL+++SI++A++LL D +PL 
Sbjct: 222 RYAMAKGAKIIKPLLVIVSIALAIRLLADPTHPL 255 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 52 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 441>: 

1 . . CTGCTAGGGT ATTGCATCGG TTATCGGTAC GgCTGTTGCA GCAAAACCAG 

51 CCGCAGACGG ATTATTTGGT CAAATTCGGA TCGTTTTGGG CGAG.ATTTT 

101 TGGTTTTCTG GGACTGTATG ACGTCTATGC TTCGGCATGG TTTGTCGTTA 

151 TCATGATGTT TTTGGTGGTT TCTACCAGTT TGTGCCTGAT TCGCAATGTG 

201 CCGCCGTTCT GGCGCGAAAT GAAGTCTTTT CGGGAAAAGG TTAAAGAAAA 

251 ATCTCTGGCG GCGATGCGCC ATTCTTCGCT GTTGGATGTA AAAATTGCGC 

301 CCGAGGTTGC CAAACGTTAT CTGGAAGTAC AAGGTTTTCA GGGGAAAACC 

351 ATTAACCGTG AAGACGGGTC GGTTCTGATT GCCGCCAAAA AAGGCACAAT 

401 GAACAAATGG GGCTATATCT TTGCCCATGT TGCTTTGATT GTCATTTGCC 

451 TGGGCGGGTT GATAGACAGT AACCTGCTGT TGAAACTGGG TATGCTGACC 

501 GGTCGGATTG TTCCGGACAA TCAGGCGGTT TATGCCAAGG ATTTC.AAGC 
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This corresponds to the amino acid sequence <SEQ ID 442; ORF1 10>: 

1 . . LLGIASVIGT LLQQNQPQTD YLVKFGS FWA XIFGFLGLYD VYASAWFWI 

51 MMFLWSTSL CLIRNVPPFW REMKSFREKV KEKSLAAMRH SSLLDVKIAP 

101 EVAKRYLEVQ GFQGKTINRE DGSVLIAAKK GTMNKWGYIF AHVALIVICL 

151 GGLIDSNLLL KLGMLTGRIF RTIRRFMPRI XKPESXFGCV QSLI*GQRQY 

201 FXRGRVRMWF S* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with ORF88a from N. meningitidis (strain A) 

ORF1 10 shows 91.5% identity over a 188aa overlap with ORF88a from strain A of M meningitidis: 



orf88a.pep 
orfllO 



MSKSRRSPPLLSRPWFAFFSSMRFA VALLSLLGIASVIGTVL QQNQPQTDYLVKFGSFWA 
I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I 
LLGIASVIGTLL QQNQPQTDYLVKFGSFWA 



orf88a.pep QIFGFLGLYDVYASAW FWIMMFLWSTSLCLI RNVPPFWREMKSFREKVKEKSLAAMRH 
OrfllO XIFGFLGLYDVYASAW FVVIMMFLWSTSLCLI RNVPPFWREMKSFREKVKEKSLAAMRH 



orf88a.pep SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWG YIFAHVALIVICL 



40 However, ORF88 and ORF 1 10 do not align, because they represent two different fragments of the 
same protein. 

Homology with a predicted ORF from N. gonorrhoeae 

ORF1 10 shows 88.6% identity over a 21 laa overlap with a predicted ORF (ORF1 lO.ng) from N. 
gonorrhoeae: 

45 orfllO. pep LLGIASVIGTLLQQNQPQTDYLVKFGSFWA 30 

I I I I I I I I I I : I I I I I I I I I I II: 

OrfllOng MSKSRISPTLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGPFWT 60 

orfllO .pep XIFGFLGLYDVYASAWFVVIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 90 

50 M i 1 1 1 1 1 i 1 1 1 1 i I I 1 1 1 1 1 1 1 1 I 1 1 I 

orfllOng RIFDFLGLYDVYASAWEWIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 120 

orfllO . pep SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWGYIFAHVALIVICL 150 
55 orfllOng SSLLDVKIAPEVAKRYLEVRGFQGKTVSREDGSVLIAAKKGTMNKWGYIXAHVALIVICL 180 
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orf 110 . pep GGLIDSNLLLKLGMLTGRIFRTIRRFMPRIXKPESXFGCVQSLIXGQRQYFXRGRVRMWF 210 

I II: 111111111:1 III: II MM II I I : I I I I I I I I I I I I I I : I II I I 
orfllOng GRLINXNLLLKLGMLAGSIFRNNRRVMPRISKPESIWGGVQSLIKGQRQYFQRGKVRMWF 240 

orfllO.pep S 211 

orfllOng S 241 

The complete length ORFllOng nucleotide sequence <SEQ ID 443> is predicted to encode a 
protein having amino acid sequence <SEQ ID 444>: 

1 MSKSRISPTL LSRPWFAFFS SMRF AVALLS LLGIASVIGT VL QQNQPQTD 

51 YLVKFGPFWT RIFDFLGLYD VYASAW FVVI MMFLVVSTSL CLI RNVPPFW 

101 REMKS FREKV KEKSLAAMRH SSLLDVKIAP EVAKRYLEVR GFQGKTVSRE 

151 DGSVLIAAKK GTMNKWGYIX A HVALIVICL GRLIMXN LLL KLGMLAGSIF 

201 RNNRRVMPRI SKPESIWGGV QSLIKGQRQY FQRGKVRMWF S* 

Based on the putative transmembrane domains in the gonococcal protein, it is predicted that the 
proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 



Example 53 

The following DNA sequence was identified in N. meningitidis <SEQ ID 445>: 

1 ATGCCGTCTG AAACACGCCT GCCGAACTTT ATCCGCGTCT TGATATTTGC 

51 CCTGGGTTTC ATCTTCCTGA ACGCCTGTTC GGAACAAACC GCGCAAACCG 

101 TTACCCTGCA AGGCGAAACG ATGGGCACGA CCTATACCGT CAAATACCTT 

151 TCAAATAATC GGGACAAACT CCCCTCACCT GCCGAAATAC AAAAACGCAT 

201 CGATGACGCG CTTAAAGAAG TCAACCGGCA GATGTCCACC TATCAGCCCG 

251 ACTCCGAAAT CAGCCGGTTC AACCAACACA CAGCCGGCAA GCCCCTCCGC 

301 ATTTCAAGCG ACTTCGCACA CGTTACTGCC GAAGCCGTCC GCCTGAACCG 

351 CCTGACACAC GGCGCGCTGG ACGTAACCGT CGGCCCCTTG GTCAACCTTT 

4 01 GGGGATTCGG CCCCGACAAA TCCGT-ACCC G7GAACCGTC GCCGGAACAA 

4 51 ATCAAACAGG CGGCATCTTA TACGGGCATA GACAAAATCA TTTTGAAACA 

501 AGGCAAAGAT TACGCTTCCT TGAGCAAAAC CCACCCCAAG GCCTATTTGG 

551 ATTTATCTTC GATTGCCAAA GGCTTCGGCG TTGATAAAGT TGCGGGCGAA 

601 CTGGAAAAAT ACGGCATTCA AAATTATCTG GTCGAAATCG GCGGCGAGTT 

651 GCACGGCAAA GGCAAAAACG CGCGCGGCGA ACCGTGGCGC ATCGGTATCG 

7 01 AGCAGCCCAA TATCGTCCAA GGCGGCAATA CGCAGATTAT CGTCCCGCTG 

751 AACAACCGTT CGCTTGCCAC TTCCGGCGAT TACCGTATTT TCCACGTCGA 

801 TAAAAACGGC AAACGCCTCT CCCATATCAT CAACCCGAAC AACAAACGAC 

851 CCATCAGCCA CAACCTCGCC TCCATCAGCG TGGTCGCAGA CAGTGCGATG 

901 ACGGCGGACG GCTTGTCCAC AGGATTATTC GTATTGGGCG AAACCGAAGC 

951 CTTAAAGCTG GCAGAGCGCG AAAAACTCGC TGTTTTCCTG ATTGTCAGGG 

1001 ATAAAGGCGG CTACCGCACC GCCATGTCTT CCGAATTTGA AAAACTGCTC 

1051 CGCTAA 

This corresponds to the amino acid sequence <SEQ ID 446; ORF1 1 1>: 

1 MPSETRLPNF IRVLIFALGF IFLNA CSEQT AQTVTLQGET MGTTYTVKYL 

51 SNNRDKLPSP AEIQKRIDDA LKEVNRQMST YQPDSEISRF NQHTAGKPLR 

101 ISSDFAHVTA EAVRLNRLTH GALDVTVGPL VNLWGFGPDK SVTREPSPEQ 

151 IKQAASYTGI DKIILKQGKD YASLSKTHPK AYLDLSSIAK GFGVDKVAGE 

201 LEKYGIQNYL VEIGGELHGK GKNARGEPWR IGIEQPNIVQ GGNTQIIVPL 

251 NNRSLATSGD YRIFHVDKNG KRLSHIINPN NKRPISHNLA SISWADSAM 

301 TADGLSTGLF VLGETEALKL AEREKLAVFL IVRDKGGYRT AMSSEFEKLL 

351 R* 



Computer analysis of this amino acid sequence gave the following results: 



-268- 



Homologv with a predicted ORF from N. meningitidis ( strain A) 

ORF1 1 1 shows 96.9% identity over a 351aa overlap with an ORF (ORF1 11a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

MPSETRLPNFIRTLIFALSFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDXLPSP 
I I I I I I I I I I I I : I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I IMI 
MPSETRLPNFIRVLIFALGFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPSP 
10 20 30 40 50 60 

70 80 90 100 110 120 

AEIQXRIDDALKEVNRQMSTYQPDSEISRFNQHTAGKPLRISSDFAHVTAEAVHLNRLTH 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I 

AEIQKRIDDALKEVNRQMSTYQPDSEISRFNQHTAGKPLRISSDFAHVTAEAVRLNRLTH 
70 80 90 100 110 120 



orfllla.pep 
orflll 

orfllla.pep 
orflll 



130 140 150 160 170 180 

orfllla.pep GALDVTVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILKQGKDYASLSKTHPK 

orflll GALDVTVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILKQGKDYASLSKTHPK 
130 140 150 160 170 180 



190 200 210 220 230 240 

orfllla.pep AYLDLSSIAKGFGVDXVAGELEKYGIQNYLVEIGGELHGKXKNARGEPWRIGIEQPNIVQ 
I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orflll AYLDLSSIAKGFGVDKVAGELEKYGIQNYLVEIGGELHGKGKNARGE PWRIGIEQPNIVQ 

190 200 210 220 230 240 

250 260 270 280 290 300 

orfllla.pep ggntqiivplnnrsxatsgdyrifhvdksgkrlshiinpnnkrpishnlasisvxadsam 

orflll ggntqiivplnnrslatsgdyrifhvdkngkrlshiinpnnkrpishnlasiswadsam 
250 260 270 280 290 300 



310 320 330 340 350 

orf 111a . pep TADGXSTGLFVLGETEALKLAEREKLAVFLIVRDKGGYRTAMSSEFEKLLRX 

I I I I I I I M ! I M I I I M M I II I I I I 

orflll TADGLSTGLFVLGETEALKLAEREKLAVFLIVRDKGGYRTAMSSEFEKLLRX 

310 320 330 340 350 

The complete length ORF1 11a nucleotide sequence <SEQ ID 447> is: 



451 
501 
551 



ATGCCGTCTG 
CCTGAGTTTT 
TTACCCTGCA 
TCAAATAATC 
CGATGACGCG 
ACTCCGAAAT 
ATTTCAAGCG 
CCTGACACAC 
GGGGATTCGG 
ATCAAACAAG 
AGGCAAAGAT 
ATTTATCTTC 
CTGGAAAAAT 
GCACGGCAAA 
AACAGCCCAA 
AACAACCGTT 
TAAAAGCGGC 
CCATCAGCCA 
ACGGCGGACG 
CTTAAAGCTG 
ATAAAGGCGG 
CGCTAA 



AAACACGCCT 
ATCTTCCTGA 
AGGTGAAACG 
GGGACNAACT 
CTTAAAGAAG 
CAGCCGGTTC 
ACTTCGCACA 
GGCGCGCTGG 
CCCCGACAAA 
CAGCATCTTA 
TACGCTTCCT 
GATTGCCAAA 
ACGGCATTCA 
GNCAAAAACG 
CATCGTCCAA 
CGNTTGCCAC 
AAACGCCTCT 
CAACCTCGCC 
GCTTNTCCAC 
GCAGAGCGCG 
CTACCGCACC 



GCCGAACTTT 
ACGCCTGTTC 
ATGGGCACGA 
CCCNTCACCT 
TCAACCGGCA 
AACCAACACA 
CGTTACTGCC 
ACGTAACCGT 
TCCGTTACCC 
TACGGGCATA 
TGAGCAAAAC 
GGCTTCGGCG 
AAATTATCTG 
CGCGCGGCGA 
GGCGGCAATA 
TTCCGGCGAT 
CCCATATCAT 
TCCATCAGCG 
AGGATTATTC 
AAAAACTCGC 
GCCATGTCTT 



ATCCGCACCT 
GGAACAAACC 
CCTATACCGT 
GCCGAAATAC 
GATGTCCACC 
CAGCCGGCAA 
GAAGCCGTCC 
CGGCCCCTTG 
GTGAACCGTC 
GAC AAAAT C A 
CCACCCCAAG 
TTGATNANGT 
GTCGAAATCG 
ACCTTGGCGC 
CGCAGATTAT 
TACCGTATTT 
TAATCCGAAC 
TGNTCGCAGA 
GTATTGGGCG 
TGTTTTCCTG 
CCGAATTTGA 



TGATATTTGC 
GCGCAAACCG 
CAAATACCTT 
AAAANCGCAT 
TATCAGCCCG 
GCCCCTCCGC 
ACCTGAACCG 
GTCAACCTTT 
GCCGGAACAA 
TTTTGAAACA 
GCCTATTTGG 
TGCGGGCGAA 
GCGGNGAGTT 
ATCGGCATCG 
CGTCCCGCTG 
TCCACGTCGA 
AACAAACGAC 
CAGTGCGATG 
AAACCGAAGC 
ATTGTCAGGG 
AAAACTGCTC 



This encodes a protein having amino acid sequence <SEQ ID 448>: 



1 MPSETRLPNF IRTLI FALSE IFLNA CSEQT AQTVTLQGET MGTTYTVKYL 
51 SNNRDXLPSP AEIQXRIDDA LKEVNRQMST YQPDSEISRF NQHTAGKPLR 
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ISSDFAHVTA EAVHLNRLTH GAL DVT VG PL VNLWGFGPDK SVTREPSPEQ 

IKQAASYTGI DKIILKQGKD YASLSKTHPK AYLDLSSIAK GFGVDXVAGE 

LEKYGIQNYL VEIGGELHGK XKNARGEPWR IGIEQPNIVQ GGNTQIIVPL 

NNRSXATSGD YRIFHVDKSG KRLSHIINPN NKRPISHNLA SISVXADSAM 

TADGXSTGLF VLGETEALKL AEREKLAVFL IVRDKGGYRT AMSSEFEKLL 



Homology with a predicted ORF from N. gonorrhoeae 

ORF1 1 1 shows 96.6% identity over a 351aa overlap with a predicted ORF (ORF1 1 l.ng) from N. 
gonorrhoeae: 

10 20 30 40 50 60 

orflllng MPSETRLPNLIRALIFALGFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPSP 

orflll MPSETRLPNFIRVLIFALGFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPSP 
10 20 30 40 50 60 



70 80 90 100 110 120 

orflll AKIQKRIDDALKEVNRQMSTYQTDSEISRFNQHTAGKPLRISSDFAHVTAEAVRLNRLTH 
I : I I I I I I I I I I I I I I I I I I I Ml I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I i I 
orflll AEIQKRIDDALKEVNRQMSTYQPDSEI SRFNQHTAGKPLRISSDFAHVTAEAVRLNRLTH 

70 80 90 100 110 120 

130 140 150 160 170 180 

orflllng GALDVTVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILQQGKDYA3L5KTHPK 
I I I I I I I I I I I I I I 1 I II I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I 
orflll GALDVTVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILKQGKDYASLSKTHPK 

130 140 150 160 170 180 



190 200 210 220 230 240 

orflllng AYLDLSSIAKGFGVDKVAGELEKYGIQNYLVEIGGELHGKGKNAHGEPWRIGIEQPNIIQ 
I I I II I I I I I I I I I I I I I I ! I i I 1 I I I M I I I I I I I I I I I I I I I : I I I I I I I I I I I I I : I 
orflll AYLDLSSIAKGFGVDKVAGELEKYGIQNYLVKIGGELHGKGKNARGE PWRIGIEQPNIVQ 

190 200 210 220 230 240 



250 260 270 280 290 300 

orflllng GGNTQIIVPLNNRSLATSGDYRIFHVDKNGKRLSHIINPNNKRPISHNLASISWSDSAM 

orflll GGNTQIIVPLNNRSLATSGDYRIFHVDKNGKRLSHIINPNNKRPISHNLASISWADSAM 
250 260 270 280 290 300 



310 320 330 340 350 

orflllng TADGLSTGLFVLGETEALRLAEQEKLAVFLIVRDKDGYRTAMSSEFAKLLRX 
I I I I I I I I I I I I I I I I I I : I I I : I I I I I I I II I I I I I I I I I I I I I I I I I I 
orflll TADGLSTGLFVLGETEALKLAEREKLAVFLIVRDKGGYRTAMSSEFEKLLRX 

310 320 330 340 350 

The complete length ORF1 1 lng nucleotide sequence <SEQ ID 449> is: 

1 ATGCCGTCTG AAACACGCCT GCCGAACCTT ATCCGCGCCT TGATATTTGC 

51 CCTGGGTTTC ATCTTCCTGA ACGCCTGTTC GGaacaaacC GCGCAaaccg 

101 TTACCCTGCA AGGCGAAAcg aTGGGTACGA CCTATACCGT CAAATACCTT 

151 TCAAATAATC GGGACAAACT CCCCTCCCCT GCCAAAATAC AAAAGCGCAT 

201 TGATGATGCG CTTAAAGAAG TCAACCGGCA GATGTCCACC TACCAGACCG 

251 ATTCCGAAAT CAGCCGGTTC AACCAACACA CAGCCGGCAA GCCCCTCCGC 

301 ATTTCAAGCG ATTTCGCACA CGTTACCGCC GAAGCCGTCC GCCTGAACCG 

351 CCTGACTCAC GGCGCACTC-G ACGTAACCGT CGGCCCTTTG GTCAACCTTT 

4 01 GGGGGTTCGG CCCCGACAAA TCCGTTACCC GTGAACCGTC GCCGGAACAA 

451 ATCAAACAGG CGGCATCTTA TACGGGCATA GACAAAATCA TTTTGCAACA 

501 AGGCAAAGAT TACGCTTCCT TGAGCAAAAC CCACCCCAAA GCCTATTTGG 

551 ATTTATCTTC GATTGCCAAA GGCTTCGGCG TTGATAAAGT TGCGGGCGAA 

601 CTGGAAAAAT ACGGCATTCA AAATTATCTG GTCGAAAtcg gcggcGAGTT 

651 GCACGGCAAA GGCAAAAATG CGCACGGCGA ACCGTGGCGC ATCGGTATAG 

7 01 AGCAACCCAA TAT CAT C CAA GgcgGCAata CGCAGATTAt cgtcccgctg 

751 aaCaaccgtt cgctTGCCAC TTCCGGCGAT TAccgtaTTT tccacgtcgA 

801 TAAAAAcggc aaacgccttt cccacaTCAT CAATCCCaAC aacAAACgac 

851 ccATCAGcca caacctcgcc tccatcagcg tggtctcAGA CAGTGCAATG 

901 ACGGCGGACG GTTtatCCAC AGGATTATTT GTTTTAGGCG AAACCGAAGC 

951 CTTAAGGCTG GCAGAACAAG AAAAACTCGC TGTTTTCCTA ATTGTCCGGG 
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1001 ATAAGGACGG CTACCGCACC GCCATGTCTT CCGAATTTGC CAAGCTGCTC 
1051 CGCTAA 

This encodes a protein having amino acid sequence <SEQ ID 450>: 

1 MPSETRLPNL IRALIFALGF IFLNA CSEQT AQTVTLQGET MGTTYTVKYL 

51 SNNRDKLPSP AKIQKRIDDA LKEVNRQMST YQTDSEISRF NQHTAGKPLR 

101 ISSDFAHVTA EAVRLNRLTH GALDVTVGPL VNLWGFGPDK SVTREPSPEQ 

151 IKQAASYTGI DKIILQQGKD YASLSKTHPK AYLDLSSIAK GFGVDKVAGE 

201 LEKYGIQNYL VEIGGELHGK GKNAHGE PWR IGIEQPNIIQ GGNTQIIVPL 

251 NNRSLATSGD YRIFHVDKNG KRLSHIINPN NKRPISHNLA SISWSDSAM 

301 TADGLSTGLF VLGETEALRL AEQEKLAVFL IVRDKDGYRT AMSSEFAKLL 

351 R* 

This protein shosw homology with a hypothetical lipoprotein precursor from H. influenzae: 

sp|P44550|YOJL_HAEIN HYPOTHETICAL LIPOPROTEIN HI0172 PRECURSOR >gi | 1074292 | pir 
hypothetical protein HI0172 - Haemophilus influenzae (strain Rd KW20) 
>gi 1 1573128 (U32702) hypothetical [Haemophilus influenzae] Length = 346 
Score = 353 bits (896), Expect = 9e-97 

Identities = 181/344 (52%), Positives = 247/344 (71%), Gaps = 4/344 (1%) 

Query: 7 LPNLIRALIFALGFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPSPAKIQKR 66 

+ LI +1 + L AC ++T + ++L G+TMGTTY VKYL + S K + 

Sbjct: 1 MKKLISGIIAVAMALSLAACQKET-KVISLSGKTMGTTYHVKYLDDGSITATSE-KTHEE 58 

Query: 67 IDDALKEVNRQMSTYQTDSEISRFNQHT-AGKPLRISSDFAHVTAEAVRLNRLTHGALDV 125 

1+ LK+VN +MSTY+ DSE+SRFNQ+T P+ IS+DFA V AEA+RLN++T GALDV 
Sbjct: 59 IEAILKDVNAKMSTYKKDSELSRFNQNTQVNTPIEISADFAKVLAEAIRLNKVTEGALDV 118 

Query: 126 TVGPLWLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILQQGKDYASLSKTHPKAYLDL 185 

TVGP+VNLWGFGP+K ++P+PEQ+ + ++ GIDKI L K+ A+LSK P+ Y+DL 
Sbjct: 119 TVGPWNLWGFGPEKRPEKQPTPEQLAERQAWVGIDKITLDTNKEKATLSKALPQVYVDL 178 

Query: 186 SSIAKGFGVDKVAGELEKYGIQNYLVEIGGELHGKGKNAHGEPWRIGIEQPNIIQGGNTQ 245 
SSIAKGFGVD+VA +LE+ QNY+VEIGGE+ KGKN G+PW+I IE+P + 

Sbjct: 179 SSIAKGFGVDQVAEKLEQLNAQNYMVEIGGEIRAKGKNIEGKPWQIAIEKPTTTGERAVE 238 
Query: 246 IIVPLNNRSLATSGDYRIFHVDKNGKRLSHIINPNNKRPISHNLASISVVSDSAMTADGL 305 

++ LNN +A+SGDYRI+ ++NGKR +H I+P PI H+LASI+V++ ++MTADGL 
Sbjct: 239 AVIGLNNMGMASSGDYRIY-FEENGKRFAHEIDPKTGYPIQHHLASITVLAPTSMTADGL 297 

Query: 306 STGLFVLGETEALRLAEQEKLAVFLIVRDKDGYRTAMSSEFAKL 349 

STGLFVLGE +AL +AE+ LAV+LI+R +G+ T SS F KL 
Sbjct: 298 STGLFVLGEDKALEVAEKNNLAVYLIIRTDNGFVTKSSSAFKKL 341 

Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, ai 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 54 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 45 1>: 

1 . . CCGTGCCGCC GACAGGGCGA CGACGTGTAT GCGGCGCACG CGTCCCGTCA 

51 AAAATTGTGG CTGCGCTTCA TCGGCGGCCG GTCGCATCAA AATATACGGG 

101 GCGGCGCGGC TGCGGACGGG TGGCGCAAAG GCGTGCAAAT CGGCGGCGAG 

151 GTGTTTGTAC GGCAAAATGA AGGCAGCCkA yTGGCAATCG GCGTGATGGG 

201 CGGCAGGGCC GGCCAGCACG CwTCAGTCAA CGGCAAAGGC GGTGCGGCAG 

251 gCAGTGATTT GTATGGTTAT GgCGGGGgTG TTTATGCTgC GTGGCATCAG 

301 TTGCGCGATA AACAAACGGG TgCGTATTTG GACGGCTGGT TGCAATACCA 

351 ACGTTTCAAA CACCGCATCA ATGATGAAAA CCGTGCGGAA CgCTACAAAA 

4 01 CCAAAGGTTG GACGGCTTCT GTCGAAGGCG GCTACAACGC GCTTGTGGCG 

4 51 GAAGGCATTG TCGGAAAAGG CAATAATGTG CGGTTTTACC TACAACCGCA 

501 GgCGCAGTTT ACCTACTTGG GCGTAAACGG CGGCTTTACC GACAGCGAGG 

551 GGACGGCGGT CGGACTGCTC GGCAGCGGTC AGTGGCAAAG CCGCGCCGGC 

601 AtTCGGGCAA AAACCCGTTT TGCTTTGCGT AACGGTGTCA ATCTTCAGCC 

651 TTTTGCCGCT TTTAATGTtt TGCACAGGTC AAAATCTTTC GGCGTGGAAA 

7 01 TGGACGGCGA AAAACAGACG CTGGCAGGCA GGACGGCACT CGAAGGGCGG 
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751 TTCGGTATTG AAGCCGGTTG GAAAGGCCAT ATGTCCGCA. . 

This corresponds to the amino acid sequence <SEQ ID 452; ORF35>: 

1 . . PCRRQGDDVY AAHASRQKLW LRFIGGRSHQ NIRGGAAADG WRKGVQIGGE 
51 VFVRQNEGSX LAIGVMGGRA GQHASVNGKG GAAGSDLYGY GGGVYAAWHQ 
5 101 LRDKQTGAYL DGWLQYQRFK HRINDENRAE RYKTKGWTAS VEGGYNALVA 

151 EGIVGKGNNV RFYLQPQAQF TYLGVNGGFT DSEGTAVGLL GSGQWQSRAG 
201 IRAKTRFALR NGVNLQPFAA FNVLHRSKSF GVEMDGEKQT LAGRT ALE GR 
251 FGIEAGWKGH MSA. . 

Computer analysis of this amino acid sequence gave the following results: 

10 Homology with putative secreted VirG-homolgue of N. meningitidis (accession number A32247) 
ORF and virg-h protein show 51% aa identity in 261aa overlap: 



--GAAGSDLYGYGt 



RIN E+ ER+ +KG TAS+E GYNAL+AE KGN++R YLQPQAQ TYLGVNG F+D 



Orf35 


5 




396 


Orf35 


64 




456 


Orf35 


122 




516 


Orf35 


182 




576 


Orf35 


242 


virg-h 


636 



V LLGS Q Q+R G++AK +F+L + ++PFAA N L+ +K FGVEMDGE++ - 



Homology with a predicted ORF from ~N .meningitidis (strain A) 

ORF35 shows 96.9% identity over a 259aa overlap with an ORF (ORF35a) from strain A of N. 
meningitidis: 

10 20 30 

orf35.pep PCRRQG D DVY'AAHAS RQKLW LRFIGGRS HQN I RG 

orf35a 

Orf35.pep GAAADGWRKGVQIGGEVFVRQNEGSXLAIGVMGGRAGQHASVNGKGGAAGSDLYGYGGGV 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I 
orf35a GAAADGRRKGVQIGGE V FVRQNE G S RLAI GVMGGRAGQHAS VNGKGGAAG S YLHGYGGGV 

370 380 390 400 410 420 

100 110 120 130 140 150 

orf 35 . pep YAAWHQLRDKQTGAYLDGWLQYQRFKHRINDENRAERYKTKGWTASVEGGYNALVAEGIV 

I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I : I 

orf 35a YAAWHQLRDKQTGAYLDGWLQYQRFKHRINDENRAERYKTKGWTASVEGGYNALVAEGW 
430 440 450 460 470 480 

orf 35 . pep 
orf35a 

490 50C 

220 230 240 250 260 

or f 35 . pep LQPFAAFNVLHRSKSFGVEMDGEKQT LAGRT ALEGRFGIEAGWKGHMSA 
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orf35a KEAALSLKWLFX 
610 620 

The complete length ORF35a nucleotide sequence <SEQ ID 453> is: 

1 ATGTTCAGAG CTCAGCTTGG TTCAAATACT CGTTCTACCA AAATCGGCGA 

51 CGATGCCGAT TTTTCATTTT CAGACAAGCC GAAACCCGGC ACTTCCCATT 

101 ATTTTTCCAG CGGTAAAACC GATCAAAATT CATCCGAATA TGGGTATGAC 

151 GAAATCAATA TCCAAGGTAA AAACTACAAT AGCGGCATAC TCGCCGTCGA 

201 TAATATGCCC GTTGTTAAGA AATATATTAC AGATACTTAC GGGGATAATT 

251 TAAAGGATGC GGTTAAGAAG CAATTACAGG AT T T AT AC AA AACAAGACCC 

301 GAAGCTTGGG AAGAAAATAA AAAACGGACT GAGGAGGCGT ATATAGAACA 

351 GCTTGGACCA AAATTTAGTA TACTCAAACA GAAAAACCCC GATTTAATTA 

4 01 ATAAATTGGT AGAAGATTCC GTACTCACTC CTCATAGTAA TACATCACAG 

4 51 ACTAGTCTCA ACAACATCTT CAATAAAAAA TTACACGTCA AAATCGAAAA 

501 CAAATCCCAC GTCGCCGGAC AGGTGTTGGA ACTGACCAAG ATGACGCTGA 

551 AAGATTCCCT TTGGGAACCG CGCCGCCATT CCGACATCCA TATGCTGGAA 

601 ACTTCCGATA ATGCCCGCAT CCGCCTGAAC ACGAAAGATG AAAAACTGAC 

651 CGTCCATAAA GCGTATCAGG GCGGTGCGGA TTTCCTGTTC GGCTACGACG 

7 01 TGCGGGAGTC GGACAAACCC GCCCTGACCT TTGAAGAAAA AGTCAGCGGA 

7 51 CAATCCGGCG TGGTTTTGGA ACGCCGGCCG GAAAATCTGA AAACGCTCGA 

801 CGGGCGCAAA CTGATTGCGG CGGAAAAGGC AGACTCTAAT TCGTTTGCGT 

851 TTAAACAAAA TTACCGGCAG GGACTGTACG AATTATTGCT CAAGCAATGC 

901 GAAGGCGGAT TTTGCTTGGG CGTGCAGCGT TTGGCTATCC CCGAGGCGGA 

951 AGCGGTTTTA TATGCCCAAC AGGCTTATGC GGCAAATACT TTGTTCGGGC 

1001 TGCGTGCCGC CGACAGGGGC GACGACGTGT ATGCCGCCGA TCCGTCCCGT 

10 51 CAAAAATTGT GGCTGCGCTT CATCGGCGGC CGGTCGCATC AAAATATACG 

1101 GGGCGGCGCG GCTGCGGACG GGCGGCGCAA AGGCGTGCAA ATCGGCGGCG 

1151 AGGTGTTTGT ACGGCAAAAT GAAGGCAGCC GGCTGGCAAT CGGCGTGATG 

12 01 GGCGGCAGGG CTGGCCAGCA CGCATCAGTC AACGGCAAAG GCGGTGCGGC 

1251 AGGCAGTTAT TTGCATGGTT ATGGCGGGGG TGTTTATGCT GCGTGGCATC 

1301 AGTTGCGCGA TAAACAAACG GGTGCGTATT TGGACGGCTG GTTGCAATAC 

1351 CAACGTTTCA AACACCGCAT CAATGATGAA AACCGTGCGG AACGCTACAA 

1401 AACCAAAGGT TGGACGGCTT CTGTCGAAGG CGGCTACAAC GCGCTTGTGG 

14 51 CGGAAGGCGT TGTCGGAAAA GGCAATAATG TGCGGTTTTA CCTGCAACCG 

1501 CAGGCGCAGT TTACCTACTT GGGCGTAAAC GGCGGCTTTA CCGACAGCGA 

1551 GGGGACGGCG GTCGGACTGC TCGGCAGCGG TCAGTGGCAA AGCCGCGCCG 

1601 GCATTCGGGC AAAAACCCGT TTTGCTTTGC GTAACGGTGT CAATCTTCAG 

1651 CCTTTTGCCG CTTTTAATGT TTTGCACAGG TCAAAATCTT TCGGCGTGGA 

17 01 AATGGACGGC GAAAAACAGA CGCTGGCAGG CAGGACGGCG CTCGAAGGGC 

17 51 GGTTCGGCAT TGAAGCCGGT TGGAAAGGCC ATATGTCCGC ACGCATCGGA 

18 01 TACGGCAAAA GGACGGACGG CGACAAAGAA GCCGCATTGT CGCTCAAATG 
1851 GCTGTTTTGA 

This encodes a protein having amino acid sequence <SEQ ID 454>: 

1 MFRAQLGSNT RSTKIGDDAD FSFSDKPKPG TSHYFSSGKT DQNSSEYGYD 

51 EINIQGKNYN SGILAVDNMP WKKYITDTY GDNLKDAVKK QLQDLYKTRP 

101 EAWEENKKRT EEAYIEQLGP KFSILKQKNP DLINKLVEDS VLTPHSNTSQ 

151 TSLNNIFNKK LHVKIENKSH VAGQVLELTK MTLKDSLWEP RRHSDIHMLE 

201 TSDNARIRLN TKDEKLTVHK AYQGGADFLF GYDVRESDKP ALTFEEKVSG 

251 QSGWLERRP ENLKTLDGRK LIAAEKADSN SFAFKQNYRQ GLYELLLKQC 

301 EGGFCLGVQR LAIPEAEAVL YAQQAYAANT LFGLRAADRG DDVYAADPSR 

351 QKLWLRFIGG RSHQNIRGGA AADGRRKGVQ IGGEVFVRQN EGSRLAIGVM 

401 GGRAGQHASV NGKGGAAGSY LHGYGGGVYA AWHQLRDKQT GAYLDGWLQY 

451 QRFKHRINDE NRAERYKTKG WTASVEGGYN ALVAEGWGK GNNVRFYLQP 

501 QAQFTYLGVN GGFTDSEGTA VGLLGSGQWQ SRAGIRAKTR FALRNGVNLQ 

551 PFAAFNVLHR SKSFGVEMDG EKQTLAGRTA LEGRFGIEAG WKGHMSARIG 

601 YGKRTDGDKE AALSLKWLF* 

Homology with a predicted ORF from N. gonorrhoeae 

ORF35 shows 51.7% identity over a 261aa overlap with a predicted ORF (ORF35ngh) from N. 
gonorrhoeae: 

orf35.pep PCRRQGDDVYAAHASRQKLWLRFIGGRSHQNIRG 34 

orf35ngh FTKVQERDDIAIYAQQAQAANTLFALRLNDKNSDIFDRTLPRKGLWLRVIDGHSNQWVQG 370 
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orf 35 .pep 


GAA-ADGWRKGVQIGGEVFVRQNEGSXLAIGVMGGRAGQHASVNGKG — GAAGSDLYGYG 


91 


orf35ngh 


KTAPVEGYRKGVQLGGEVFTWQNESNQLSIGLMGGQAEQRSTFRNPDTDNLTTGNVKGFG 


430 


orf 35 .pep 


GGVYAAWHQLRDKQTGAYLDGWLQYQRFKHRINDENRAERYKTKGWTASVEGGYNALVAE 


151 


orf 35ngh 


: 1 1 1 1 : 1 1 1 1 : 1 1 1 1 1 1 1 : 1 : 1 : 1 1 1 1 1 : 1 1 1 1 1 : 1 1 : ill 1 1 1 : 1 : 1 1 1 1 1 : 1 1 
AGVYATWHQLQDKQTGAYVDSWMQYQRFRHRINTEYATERFTSKGITASIEAGYNALLAE 


490 


orf35 .pep 


GIVGKGNNVRFYLQPQAQFTYLGVNGGFTDSEGTAVGLLGSGQWQSRAGIRAKTRFALRN 


211 


orf35ngh 


: : 1 1 1 : : 1 1 1 1 II 1 1 : 1 1 1 1 1 1 1 1 : 1 1 1 : : hill 1 1 1 1 : I : : I I : : 1 1 : 1 
HFTKKGNSLRVYLQPQAQLTYLGVNGKFSDSENAQVNLLGSRQLQSRVGVQAKAQFAFTN 


550 


orf 35. pep 


GVNLQPFAAFNVLHRSKSFGVEMDGEKQTLAGRTALEGRFGIEAGWKGHMSA 


263 


orf35ngh 


GVTFQPFVAVNSIYQQKPFGVEIDGDRRVINNKTVIETQLGVAAKIKSHLTLQASFNRQT 


610 



A partial ORF35ngh nucleotide sequence <SEQ ID 455> is predicted to encode a protein having 
partial amino acid sequence <SEQ ID 456>: 

1 . . KKLRDRNSEY WKEETYHIKS NGRTYPNIPA LFPKHPFDPF ENINNSKKIS 

51 FYDKEYTEDY LVGFARGFGV EKRNGEEEKP LRQYFKDCVN TENSNNDNCK 

101 ISSFGNYGPI LIKSDIFALA SQIKNSHINS EILSVGNYIE WLRPTLNKLT 

151 GWQEHLYAGL DPFHYIEVTD NSHVIGQTID LGALELTNSL WKPRWNSNID 

201 YLITKNAEIR FNTKNESLLV KEDYAGGARF RFAYDLKDKV PEIPVLTFEK 

251 NITGTSDIIF EGKALDNLKH LDGHQIVKVN DTADKDAFRL SSKYRKGIYT 

301 LSLQQRPEGF FTKVQERDDI AIYAQQAQAA NTLFALRLND KNSDIFDRTL 

351 PRKGLWLRVI DGHSNQWVQG KTAPVEGYRK GVQLGGEVFT WQNESNQLSI 

4 01 GLMGGQAEQR STFRNPDTDN LTTGNVKGFG AGVYATWHQL QDKQTGAYVD 

451 SWMQYQRFRH RINTEYATER FTSKGITASI EAGYNALLAE HFTKKGNSLR 

501 VYLQPQAQLT YLGVNGKFSD SENAQVNLLG SRQLQSRVGV QAKAQFAFTN 

551 GVTFQPFVAV NSIYQQKPFG VEIDGDRRVI NNKTVIETQL GVAAKIKSHL 

601 TLQASFNRQT SKHHHAKQGA LNLQWTF* 

Based on this prediction, these proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 55 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 457>: 

1 . . GCGGAATATG TTCAGTTCTC TATAGATTTG TTCAGTGTGG GTAAATCGGG 

51 GGGCGGTATA CCTAAGGCTA AGCCTGTGTT TGATGCGAAA CCGAGATGGG 

101 AGGTTGATAG GAAGCTTAAT AAATTGACAA CTCGTGAGCA GGTGGAGAAA 

151 AATGTTCAGG AAACGAGAAG AAGGAGTCAG AGTAGTCAGT TTAAAGCCCA 

201 TGCGCAACGA GAATGGGAAA ATAAAACAGG GTTAGATTTT AAT CAT T TT A 

251 TAGGTGGTGA TATCAATAAA AAAGGCACAG TAACAGGAGG GCATAGTCTA 

301 ACCCGTGGTG ATGTACGGGT GATACAACAA ACCTCGGCAC CTGATAAACA 

351 TGGGGT.TTA TCAAGCGACA GTGGAAATTN A 

This corresponds to the amino acid sequence <SEQ ID 458; ORF46>: 

1 . . AEYVQFSIDL FSVGKSGGGI PKAKPVFDAK PRWEVDRKLN KLTTREQVEK 

51 NVQETRRRSQ SSQFKAHAQR EWENKTGLDF NHFIGGDINK KGTVTGGHSL 

101 TRGDVRVIQQ TSAPDKHGXL SSDSGNX 

Further work revealed further partial nucleotide sequence <SEQ ID 45 9>: 

1 . . GCAGTGTGCC TnCCGATGCA TGCACACGCC TCAnATTTGG CAAACGATTC 

51 TTTTATCCGG CAGGTTCTCG ACCGTCAGCA TTTCGAACCC GACGGGAAAT 

101 ACCACCTATT CGGCAGCAGG GGGGAA.CTTG CCGAGCGCCA GTCTCATATC 

151 GGATTGGGAA AAATACAAAG CCATCAGTTG GGCAACCTGA TGATTCAACA 

201 GGCGGCCATT AAAGGAAATA TCGGCTACAT TGTCCGCTTT TCCGATCACG 

251 GGCACGAAGT CCATTCCCCs TTCGACAACC ATGCCTCACA TTCCGATTCT 

301 GATGAAGCCG GTAGTCCCGT TGACGGATTT AGCCTTTACC GCATCCATTG 

351 GGACGGATAC GAACACCATC CCGCCGACGG CTATGACGGG CCACAGGGCG 

4 01 GCGGCTATCC CGCTCCCAAA GGCGCGAGGG ATATATACAG TTACGACATA 
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4 51 AAAGGCGTTG CCCAAAATAT CCGCCTCAAC CTGACCGACA ACCGCAGCAC 

501 CGGACAACGG CTTGCCGACC GTTTCCACAA TGCCGGTAGT ATGCTGACGC 

551 AAGGAGTAGG CGACGGATTC AAACGCGCCA CCCGATACAG CCCCGAGCTG 

601 GACAGATCGG GCAATGCCGC CGAAGCCTTC AACGGCACTG CAGATATCGT 

651 TAAAAACATC ATCGGCGCTG CAGGAGAAAT TGT 

This corresponds to the amino acid sequence <SEQ ID 460; ORF46-l>: 

1 . . AVCL PMHAHA SXLANDSFIR QVLDRQHFEP DGKYHLFGSR GELAERQSHI 

51 GLGKIQSHQL GNLMIQQAAI KGNIGYIVRF SDHGHEVHSP FDNHASHSDS 

101 DEAGSPVDGF SLYRIHWDGY EHHPADGYDG PQGGGYPAPK GARDIYSYDI 

151 KGVAQNIRLN LTDNRSTGQR LADRFHNAGS MLTQGVGDGF KRATRYSPEL 

201 DRSGNAAEAF NGTADIVKNI IGAAGEI 

Computer analysis of this amino acid sequence gave the following results: 



Homology w ith a predicted QRF from N. gonorrhoeae 

ORF46 shows 98.2% identity over a 1 1 laa overlap with a predicted ORF (ORF46ng) from N. 



Orf46.pep AE YVQFS I DLFSVGKSGGG I PKAKPVFDAKPRWEVDRKLNKLTTR 45 

I I I I I I I II 

orf4 6ng PKTGVPFDGKGFPNFEKHVKYDTKLDIQELSGGGI PKAKPVFDAKPRWEVDRKLNKLTTR 217 

orf46.pep EQVEKNVQETRRRSQSSQFKAHAQREWENKTGLDFNHFIGGDINKKGTVTGGHSLTRGDV 105 

or f 4 6ng EQVEKNVQETRRRSQSSQFKAHAQREWENKTGLDFNHFIGGDINKKGAVTGGHSLTRGDV 27 7 

orf46.pep RV I QQT SAP DKHGXLS S DSGN 126 

I I I I I I I I I I I I I I 

orf46ng RVIQQTSAPDKHGVLSSDSGN 298 

A partial ORF46ng nucleotide sequence <SEQ ID 461 > is predicted to encode a protein having 
partial amino acid sequence <SEQ ID 462>: 



1 . . RRLKHCCHAR LGSAFHRKQD GAHQRFGRYG ATQRLCRSSH PRLGSPKPQC 

51 RTRHRSRQQY LYGSHPHQRD WSCPGKIQLG RHHGTSCRAV ADXRDRICER 

101 EIRRQRQXCR CRLGKIPSLS IPKYPLKLEQ RYGKENITSS TVPPSNGKNV 

151 KLADQRHPKT GVPFDGKGFP NFEKHVKYDT KLDIQELSGG GIPKAKPVFD 

201 AKPRWEVDRK LNKLTTREQV EKNVQETRRR SQSSQFKAHA QREWENKTGL 

251 DFNHFIGGDI NKKGAVTGGH SLTRGDVRVI QQTSAPDKHG VLSSDSGN* 

Further work revealed the complete gonococcal DNA sequence <SEQ ID 463>: 



1 


TTGGGCATTT 


CCCGCAAAAT 


51 


CCTGCCGATG 


CATGCACACG 


101 


GgCaggttcT 


CGaccGTCAG 


151 


TTcggCaGCA 


GGGGGGAGCT 


201 


aaacaTAcaa 


Agccatcagt 


251 


ttgaaggaaA 


TAtcgGctac 


301 


ttccattcgc 


ccttcGAcaa 


351 


CGGTAGTCCC 


GTTGACGGAT 


401 


ACGAACACCA 


TCCCGCCGAC 


451 


CCCGCTCCCA 


AAGGCGCGAG 


501 


TGCCCAAAAT 


ATCCGCCTCA 


551 


GGCTTGCCGA 


CCGTTTCCAC 


601 


GGCGACGGAT 


TCAAACGCGC 


651 


GGGCAATGCc 


gccGAAGCCT 


701 


TCATCGGCGC 


GGCAGGAGAA 


751 


ATAAGCGAAG 


GCTCAAACAT 


801 


CACCGAAAAC 


AAGATGGCGC 


851 


TCAAAGACTA 


TGCCGCAGCA 


901 


AATGCCGCAC 


AAGGCATAGA 


951 


CCCCATCAAA 


GGGATTGGAG 


1001 


TCACGGCACA 


TCCTGTCAAG 


1051 


AAAGGGAAAT 


CCGCCGTCAG 


1101 


ATACCCGTCC 


CCTTACCATT 



ATCCCTTATT CTGTCCATAC TGGCAGTGTG 
CCTCAGATTT GGcaAACGAT CCCTTTATCC 
CATTTCGaac ccgacggGAa ATACCaCCTA 
TgccnagcGC aacggccATa tcggattggG 
tGggccacct gatgattcaa caggcggccg 
attgtccgct tttccgatca cgggcacaaa 
CcaTGCCTCA CATTCCGATT CTGACGAAGC 
TCAGCCTTTA CCGCATCCAT TGGGACGGAT 
GGCTATGACG GGCCACAGGG CGGCGGCTAT 
GGATATATAC AGCTACGACA TAAAAGGCGT 
ACCTGACCGA CAACCGCAGC ACCGGACAAC 
AATGCCGGCG CTATGCTGAC GCAAGGAGTA 
CACCCGATAC AGCCCCGAGC TGGACAGATC 
TCAACGGCAC TGCAGATATC GTCAAAAACA 
ATTGTCGGCG CAGGCGATGC CGTGCagGGT 
TGCTGTCATG CACGGCTTGG GTCTGCTTTC 
GCATCAACGA TTTGGCAGAT ATGGCGCAAC 
GCCATCCGCG ATTGGGCAGT CCAAAACCCC 
AGCCGTCAGC AATATCTTTA TGGCAGCCAT 
CTGTCCGGGG AAAATACGGC TTGGGCGGCA 
CGGTCGCAGA TGGGCGCGAT CGCATTGCCG 
CGACAATTTT GCCGATGCGG CATACGCCAA 
CCCGAAATAT CCGTTCAAAC TTGGAGCAGC 
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1151 GTTACGGCAA AGAAAACATC ACCTCCTCAA CCGTGCCGCC GTCAAACGGC 

1201 AAAAATGTCA AACTGGCAGA CCAACGCCAC CCGAAGACAG GCGTACCGTT 

1251 TGACGGTAAA GGGTTTCCGA ATTTTGAGAA GCACGTGAAA TATGATACGA 

1301 AGCTCGATAT TCAAGAATTA TCGGGGGGCG GTATACCTAA GGCTAAGCCT 

1351 GTGTTTGATG CGAAACCGAG ATGGGAGGTT GATAGGAAGC TTAATAAATT 

1401 GACAACTCGT GAGCAGGTGG AGAAAAATGT TCAGGAAACG AGAAGAAGGA 

1451 GTCAGAGTAG TCAGTTTAAA GCCCATGCGC AACGAGAATG GGAAAATAAA 

1501 ACAGGGTTAG ATTTTAATCA TTTTATAGGT GGTGATATCA ATAAGAAAGG 

1551 CACAGTAACA GGAGGGCATA GTCTAACCCG TGGTGATGTA CGGGTGATAC 

1601 AACAAACCTC GGCACCTGAT AAACATGGGG TTTATCAAGC GACAGTGGAA 

1651 ATTAAAAAGC CTGATGGAAG TTGGGAGGTG AAAACGAAAA AAGGTGGGAA 

1701 AGTGATGACC AAGCACACCA TGTTCCCAAA AGATTGGGAT GAGGCTAGAA 

1751 TTAGGGCTGA AGTTACTTCG GCTTGGGAAA GTAGAATAAT GCTTAAGGAT 

1801 AATAAATGGC AGGGTACAAG TAAATCGGGT ATTAAAATAG AAGGATTTAC 

1851 CGAACCTAAT AGAACAGCAT ATCCCATTTA TGAATAG 

This corresponds to the amino acid sequence <SEQ ID 464; ORF46ng-l>: 



1 LGISRKISLI LSILAVCLPM HAHA SDLAND PFIRQVLDRQ HFEPDGKYHL 

51 FGSRGELAXR NGHIGLGNIQ SHQLGHLMIQ QAAVEGNIGY IVRFSDHGHK 

101 FHSPFDNHAS HSDSDEAGSP VDGFSLYRIH WDGYEHHPAD GYDGPQGGGY 

151 PAPKGARDIY SYDIKGVAQN IRLNLTDNRS TGQRLADRFH NAGAMLTQGV 

201 GDGFKRATRY SPELDRSGNA AEAFNGTADI VKNIIGAAGE IVGAGDAVQG 

251 ISEGSNIAVM HGLGLLSTEN KMARINDLAD MAQLKDYAAA AIRDWAVQNP 

301 NAAQGIEAVS NIFMAAIPIK GIGAVRGKYG LGGITAHPVK RSQMGAIALP 

351 KGKSAVSDNF ADAAYAKYPS PYHSRNIRSN LEQRYGKENI TSSTVPPSNG 

401 KNVKLADQRH PKTGVPFDGK GFPNFEKHVK YDTKLDIQEL SGGGIPKAKP 

4 51 VFDAKPRWEV DRKLNKLTTR EQVEKNVQET RRRSQSSQFK AHAQREWENK 

501 TGLDFNHFIG GDINKKGTVT GGHSLTRGDV RVIQQTSAPD KHGVYQATVE 

551 IKKPDGSWEV KTKKGGKVMT KHTMFPKDWD EARIRAEVTS AWESRIMLKD 

601 NKWQGTSKSG IKIEGFTEPN RTAYPIYE* 

ORF46ng-l and ORF46-1 show 94.7% identity in 227 aa overlap: 

10 20 30 40 

AVCLPMHAHASXLANDSFIRQVLDRQHFEPDGKYHLFGSRGELAER 

I I I I I I I I I I I II I I I I I I I I I I I I I i I I I I I I I I I 

LGISRKISLILS ILAVCLPMHAHASDLANDPFIRQVLDRQHFEPDGKYHLFGSRGELAXR 
10 20 30 40 50 60 



orf46-l.pep 
orf 4 6ng-l 



orf 4 6-1 . pep QSHIGLGKIQSHQLGNLMIQQAAIKGNIGYIVRFSDHGHEVHSPFDNHASHSDSDEAGSP 

:: I I I I I : I I I I I I I : I I I I I I I :: I I I II I I I I I I I I I : I I I I I I I I I I I I I I I I I I I 

orf4 6ng-l NGHIGLGNIQSHQLGHLMIQQAAVEGNIGYIVRFSDHGHKFHSPFDNHASHSDSDEAGSP 
70 80 90 100 110 120 

110 120 130 140 150 160 

orf 4 6-1 . pep VDGFSLYRIHWDGYEHHPADGYDGPQGGGYPAPKGARDIYSYDIKGVAQNIRLNLTDNRS 

I I I I i I I I I I I I I I I I I I I I I I I I 

Orf4 6ng-l VDGFSLYRIHWDGYEHHPADGYDGPQGGGYPAPKGARDIYSYDIKGVAQNIRLNLTDNRS 
130 140 150 160 170 180 

170 180 190 200 210 220 

orf 4 6-1 . pep TGQRLADRFHNAGSMLTQGVGDGFKRATRYSPELDRSGNAAEAFNGTADIVKNIIGAAGE 

I I I I 1:1111 I I I I II 11 I 

orf4 6ng-l TGQRLADRFHHAGAMLTQGVGDGFKRATRYSPELDRSGNAAEAFNGTAD I VKNIIGAAGE 

190 200 210 220 230 240 

orf 4 6-1. pep I 
orf4 6ng-l 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF46ng-l shows 87.4% identity over a 486aa overlap with an ORF (ORF46a) from strain A of 
N. meningitidis: 
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LGISRKISLILSILAVCLPMHAHASDLAIOSFIRQVLDRQHFEPDGKYHLFGSRGELAER 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
LGISRKISLILSILAVCLPMHAHASDLANDPFIRQVLDRQHFEPDGKYHLFGSRGELAXR 



SGHIGLGNIQSHQLGNLFIQQAAIKGNIGYIVRFSDHGHEVHSPFDNHASHSDSDEAGSP 



VDGFSLYRIHWDGYEHKPACGYDGPQSGGYPAPKGARDIYSYDIKGVAQNIRLNLTDNRS 



NAAQGIEAVSNIFTAVI PVKGIGAVRGKYGLGGITAHPVKRSQMGEIALPKGKSAVSDNF 



ADAAYAKYPSPYHSRNIRSNLEQRYGKENITS5TVPPSNGKNVKLANKRHPKTKVPFDGK 



GFPNFEKDVKYDTRINTAVPQVN— 



-PKGSVGSAHSWSITARIQYAKLP 



orf 46a. pep 
orf 4 6ng-l 



RQGRIRYIPPKNYSPSAPLPKGPNNGYLDKFGNEWTKGPSRTKGQE FEWDVQLSKTGREQ 



The complete length ORF46a DNA 



510 

<SEQ ED 465> is: 



1 TTGGGCATTT 

51 CCTGCCGATG 

101 GGCAGGTTCT 

151 TTCGGCAGCA 

201 AAACATACAA 

251 TTAAAGGAAA 

301 GTCCATTCCC 

351 CGGTAGTCCC 

401 ACGAACACCA 

451 CCCGCTCCCA 

501 TGCCCAAAAT 

551 GGCTTGTCGA 

601 GGCGACGGAT 

651 GGGCAATGCC 

7 01 TCATCGGCGC 

7 51 ATAAGCGAAG 

801 CACCGAAAAC 



CCCGCAAAAT 
CATGCACACG 
CGACCGTCAG 
GGGGGGAACT 
AGCCATCAGT 
TATCGGCTAC 
CCTTCGACAA 
GTTGACGGAT 
TCCCGCCGAC 
AAGGCGCGAG 
ATCCGCCTCA 
CCGTTTCCAC 
TCAAACGCGC 
GCCGAAGCTT 
GGCAGGAGAA 
GCTCAAACAT 
AAGATGGCGC 



ATCCCTTATT 
CCTCAGATTT 
CATTTCGAAC 
TGCCGAGCGC 
TGGGCAACCT 
ATTGTCCGCT 
CCATGCCTCA 
TCAGCCTTTA 
GGCTATGACG 
GGATATATAC 
ACCTGACCGA 
AATACCGGTA 
CACCCGATAC 
TCAACGGCAC 
ATTGTCGGCG 
TGCTGTTATG 
GCATCAACGA 



CTGTCCATAC 
GGCAAACGAT 
CCGACGGGAA 
AGCGGTCATA 
GTTCATCCAG 
TTTCCGATCA 
CATTCCGATT 
CCGCATCCAT 
GGCCACAGGG 
AGCTACGACA 
CAACCGCAGC 
GTATGCTGAC 
AGCCCCGAGC 
TGCAGATATC 
CAGGCGATGC 
CACGGCTTGG 
TTTGGCAGAT 



TGGCAGTGTG 
TCTTTTATCC 
ATACCACCTA 
TCGGATTGGG 
CAGGCGGCCA 
CGGGCACGAA 
CTGATGAAGC 
TGGGACGGAT 
CGGCGGCTAT 
TAAAAGGCGT 
ACCGGACAAC 
GCAAGGAGTA 
TGGACAGATC 
GTCAAAAACA 
CGTGCAGGGT 
GTCTGCTTTC 
ATGGCGCAAC 
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851 TCAAAGACTA TGCCGCAGCA GCCATCCGCG ATTGGGCAGT CCAAAACCCC 

901 AATGCCGCAC AAGGCATAGA AGCCGTCAGC AATATCTTTA CGGCAGTCAT 

951 CCCCGTCAAA GGGATTGGAG CTGTTCGGGG AAAATACGGC TTGGGCGGCA 

1001 TCACGGCACA TCCTGTCAAG CGGTCGCAGA TGGGCGAGAT CGCATTGCCG 

1051 AAAGGGAAAT CCGCCGTCAG CGACAATTTT GCCGATGCGG CATACGCCAA 

1101 ATACCCGTCC CCTTACCATT CCCGAAATAT CCGTTCAAAC TTGGAGCAGC 

1151 GTTACGGCAA AG AAAAC AT C ACCTCCTCAA CCGTGCCGCC GTCAAACGGA 

1201 AAGAATGTGA AACTGGCAAA CAAACGCCAC CCGAAGACCA AAGTGCCGTT 

1251 T GACGGT AAA GGGTTTCCGA ATTTTGAAAA AGACGTAAAA TACGATACGA 

1301 GAATTAATAC CGCTGTACCA CAAGTGAATC CTATAGATGA ACCCGTCTTT 

1351 AATCCTAAAG GTTCTGTCGG ATCGGCTCAT TCTTGGTCTA TAACTGCCAG 

1401 AATTCAATAC GCAAAATTAC CAAGGCAAGG TAGAATCAGA TATATCCCAC 

14 51 CTAAAAATTA CTCTCCTTCA GCACCGCTAC CAAAAGGACC TAATAATGGA 

1501 TATTTGGATA AATTTGGTAA TGAATGGACT AAAGGTCCAT CAAGAACTAA 

1551 AGGTCAAGAA TTTGAATGGG ATGTTCAATT GTCTAAAACA GGAAGAGAGC 

1601 AACTTGGATG GGCTAGTAGG GATGGTAAGC ATTTAAATAT ATCAATTGAT 

1651 GGAAAGATTA CACACAAATG A 

This corresponds to the amino acid sequence <SEQ ID 466>: 

1 LGISRKISLI LSILAVCLPM HAHA SDLAND SFIRQVLDRQ HFEPDGKYHL 

51 FGSRGELAER SGHIGLGNIQ SHQLGNLFIQ QAAIKGNIGY IVRFSDHGHE 

101 VHSPFDNHAS HSDSDEAGSP VDGFSLYRIH WDGYEHHPAD GYDGPQGGGY 

151 PAPKGARDIY SYDIKGVAQN IRLNLTDNRS TGQRLVDRFH NTGSMLTQGV 

201 GDGFKRATRY SPELDRSGNA AEAFNGTADI VKNIIGAAGE IVGAGDAVQG 

251 ISEGSNIAVM HGLGLLSTEN KMARINDLAD MAQLKDYAAA AIRDWAVQNP 

301 NAAQGIEAVS NIFTAVIPVK GIGAVRGKYG LGGITAHPVK RSQMGEIALP 

351 KGKSAVSDNF ADAAYAKYPS PYHSRNIRSN LEQRYGKENI TSSTVPPSNG 

401 KNVKLANKRH PKTKVPFDGK GFPNFEKDVK YDTRINTAVP QVNPIDEPVF 

451 NPKGSVGSAH SWSITARIQY AKLPRQGRIR YIPPKNYSPS APLPKGPNNG 

501 YLDKFGNEWT KGPSRTKGQE FEWDVQLSKT GREQLGWASR DGKHLNISID 

551 GKITHK* 

Based on this analysis, including the presence of a RGD sequence in the gonococcal protein, typical 
of adhesins, it is predicted that the proteins from N. meningitidis and ~N. gonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 56 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 467>: 

1 ATGAATATTC ACACCCTGCT CTCCAAACAA TGGACGCTGC CGCCATTCCT 

51 GCCGAAACGG CTGCTGCTGT CCCTGCTGAT ACTGCTTGCC CCCAATGCGG 

101 TGTTTTGGGT TTTGGCACTG CTGACCGCCA CCGCCCGCCC GATTGTCAAT 

151 TTGGACTATC TTCCCGCCGC GCTGCTGATC GCCCTGCCTT GGCGTTTCGT 

201 CAAAATTGCC GGCGTATTGG CGTTTTGGCT GGCGGTTTTG TTTGACGGGC 

251 TGATGATGGT GATCCAACTC TTCCCTTTTA 7GGATCTCAT CGGCGCCATC 

301 AACCTCGTCC CCTTCATCCT GACCGCCCCC GCCCCTTATC AGATAATGAC 

351 CGGGCTG... 

This corresponds to the amino acid sequence <SEQ ID 468; ORF48>: 

1 MNIHTLLSKQ WTLPPFLPKR LLLSLLILLA PNAVFWVLAL LTATARPIVN 

51 LDYLPAALLI ALPWRFVKIA GVLAFWLAVL FDGLMMVIQL FPFMDLIGAI 

101 NLVPFILTAP APYQIMTGL. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 469>: 

1 ATGAATATTC ACACCCTGCT CTCCAAACAA TGGACGCTGC CGCCATTCCT 

51 GCCGAAACGG CTGCTGCTGT CCCTGCTGAT ACTGCTTGCC CCCAATGCGG 

101 TGTTTTGGGT TTTGGCACTG CTGACCGCCA CCGCCCGCCC GATTGTCAAT 

151 TTGGACTATC TTCCCGCCGC GCTGCTGATC GCCCTGCCTT GGCGTTTCGT 

201 CAAAATTGCC GGCGTATTGG CGTTTTGGCT GGCGGTTTTG TTTGACGGGC 

251 TGATGATGGT GATCCAACTC TTCCCTTTTA TGGATCTCAT CGGCGCCATC 

301 AACCTCGTCC CCTTCATCCT GACCGCCCCC GCCCCTTATC AGATAATGAC 

351 CGGGCTGTTG CTGCTGTATA TGCTGGCGAT GCCGTTTGTG TTGCAGAAAG 
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4 01 CCGCCGCCAA AACCGACTTC CGGCACATTG CCGTCTGCGC CGCCGTTGTG 

4 51 GCGGCAGCCG GCTATTTCAC CGGCCATTTG AGTTACTACG ACCGGGGTCG 

501 GATGGCCAAT ATCTTCGGCG CAAACAACTT CTACTACGCC AAAAGTCAGG 

551 CGATGCTCTA CACCGTCAGC CAGAATGCCG ACTTTATTAC CGCCGGCCTG 

601 GTCGATCCCG TCTTCCTCCC CTTGGGCAAT CAACAGCGTG CCGCCACGCA 

651 TCTGAACGAG CCGAAATCTC AAAAAATCCT CTTTATCGTC GCCGAATCTT 

7 01 GGGGGCTGCC GGCCAATCCC GAACTTCAAA ACGCCACTTT TGCCAAACTG 

7 51 CTGGCGCAAA AAGACCGTTT TTCGGTTTGG GAAAGCGGCA GTTTTCCCTT 

801 CATCGGCGCG ACGGTCGAAG GCGAAATGCG CGAACTGTGT GCCTACGGCG 

851 GTTTGCGCGG GTTCGCACTG CGCCGCGCGC CCGACGAAAA ATTTGCCCGC 

901 TGCCTCCCCA ACCGTTTGAA ACAAGAA3G7 TACGCCACCT TTGCGATGCA 

951 CGGCGCGGGC AGTTCGCTTT ACGACCGCTT CAGCTGGTAT CCGAGGGCGG 

1001 GCTTTCAAGA AATCAAAACC GCCGAAAACC TGATCGGTAA AAAAACCTGC 

1051 GCCATTTTCG GCGGCGTGTG CGACAGCGAG CTGTTCGGCG AAGTGTCGGC 

1101 ATTTTTCAAA AAACACGACA AGGGACTGTT TTACTGGATG ACGCTGACCA 

1151 GCCACGCCGA CTATCCCGAA TCCGACATTT TCAACCACAG GCTCAAATGC 

1201 ACCGAATATG GCCTGCCCGC CGAAACCGAC CTCTGCCGCA ATTTCAGCCT 

1251 GCACACCCAA TTCTTCGACC AACTGGCGGA TTTGATCCAA CGCCCCGAAA 

1301 TGAAAGGCAC GGAAGTCATC ATCGTCGGCG ACCATCCGCC GCCCGTCGGC 

1351 AACCTCAATG AAACCTTCCG CTACCTCAAA CAGGGGCACG TCGCCTGGCT 

1401 GAACTTCAAA ATCAAATAA 

This corresponds to the amino acid sequence <SEQ ID 470; ORF48-l>: 



1 MNIHTLLSKQ WTLPPFLPKR LLLSLLILLA PNAVFWVLAL LTATA RPIVN 

51 LDYLPAALLI ALPWRFVKIA G VLAFWLAVL FDGLMMVI Q L FPFMDLIGAI 

101 NLVPFILTAP APYQ IMTGLL LLYMLAMPFV L QKAAAKTDF R HIAVCAAW 

151 AAAGYFTG HL SYYDRGRMAN IFGANNFYYA KSQAMLYTVS QNADFITAGL 

201 VDPVFLPLGN QQRAATHLNE PKSQKILFIV AESWGLPANP ELQNATFAKL 

251 LAQKDRFSVW ESG3FPFIGA TVEGEMRELC AYGGLRGFAL RRAPDEKFAR 

301 CLPNRLKQEG YAT FANHGAG SSLYDRFSWY PRAGFQEIKT AENLIGKKTC 

351 AIFGGVCDSE LFGSVSAFFK KHDKGLFYWM TLTSKADYPE SDIFNHRLKC 

401 TEYGLPAETD LCRNFSLHTQ FFDQLADLIQ RPEMKGTEVI IVGDHPPPVG 

4 51 NLNETFRYLK QGHVAWLNFK IK* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted QRF from N. meningitidis (strain A) 

ORF48 shows 94.1% identity over a 1 19aa overlap with an ORF (ORF48a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 48 . pep MNIHTLLSKQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATA RPIVNLDYLPAALLI 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf48a MNIHTLLSKQWTLPPFLPKRLLLSLLILLXPNAVFWVLALLTATA RPIVNLXYLPAALLI 

10 20 30 40 50 60 



70 80 90 100 110 119 

orf 48. pep ALPWRFVKIAG VLAFWLAVLFDGLMMVI Q LFPFMDLIGAINLVPFI LTAPAPYQ IMTGL 
I I I I I III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf48a ALPWRXVKIXG VLAXWLAVLFD3LMMVI Q LFPFMDLIGAINLVPFI XTAPALYQ IMTGLL 

70 80 90 100 110 120 



The complete length ORF48a nucleotide sequence <SEQ ID 471> is: 



1 ATGAATATTC ACACCCTGCT CTCCAAACAA TGGACGCTGC CGCCATTCCT 

51 GCCGAAACGG CTGCTGCTGT CCCTGCTGAT ACTGCTNNCC CCCAATGCGG 

101 TGTTTTGGGT TTTGGCACTG CTGACCGCCA CCGCCCGCCC GATTGTCAAT 

151 TTGGANTACC TTCCCGCCGC GCTGCTGATC GCCCTGCCTT GGCGTNTCGT 

201 CAAAATTGNC GGCGTATTGG CGTNTTGGCT GGCGGTTTTG TTTGACGGGC 

251 TGATGATGGT GATCCAACTC TTCCCTTTTA TGGATCTCAT CGGCGCCATC 

301 AACCTCGTCC CCTTCATCNT GACCGCCCCC GCCCTTTATC AGATAATGAC 

351 CGGGCTGTTA CTGCTGTATA TGCTGGCGAT GCCGTTTGTG TTGCAGAAAG 

401 CCGCCGCCAA AACCGACTTC CGACACATTG CCGCCTGTGC CGCCGTTGTG 

451 GTGGCAGCCG GCTATTTTAC CGGCCATTTG AGTTANTACG ACCGGGGGCG 
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501 GATGGCCAAT ATCTTCGGCG CAAACAACTT CTATTACGCC AAAAGTCAGG 

551 CGATGCTCTA CACCGTCAGC CAGAATGCCG ACTTTATTAC CGCCGGCCTG 

601 GTCGATCCCG TCTTCCTCCC CTTGGGCAAT CAACAGCGTG CCGCCACGCA 

651 TCTGAACGAG CCGAAATCTC AAAAAATCCT CTTTATCGTC GCCGAATCTT 

7 01 GGGGGCTGCC GGCCAATCCC GAACTTCAAA ACGCCACTTT TGCCAAACTG 

7 51 CTGGCGCAAA AAGANCGTTT TTCGGTTTGG GAAAGCGGCA GTTTTCCCTT 
801 CATCGGCGCG ACGATCGAAG GCGAAATGCG CGAACTGTGT GCCTACGGCG 

8 51 GTTTGCGCGG GTTCGCACTG CGCCGCGCGC CCGACGAAAA ATTTGCCCGC 
901 TGCCTCCCCA ACCGTTTGAA ACAAGAAGGT TACGCCACCT TTGCGATGCA 
951 CGGCGCGGGC AGTTCGCTTT ACGACCGCTT CAGCTGGTAT CCGAGGGCGG 

1001 GCTTTCAAGA AATCAAAACC GCCGAAAACC TGATCGGTAA AAAAACCTGC 

1051 GCCATTTTCG GCGGCGTGTG CGACAGCGAG CTGTTCGGCG AAGTGTCGGC 

1101 ANTTTTCAAA AAACACGACA AGGGACTGTT TTACTGGATG ACGCTGACCA 

1151 GCCACGCCGA CTATCCCGAA TCNGACATTT TCAACCACAG GCTCAAATGC 

12 01 ACCGAATATG GCCTGCCCGC CGAAACCGAC NTCTGCCGCA ATTTCAGCCT 

1251 GCACACCCAA TTCTTCGACC AACTGGCGGA TTTGATCCAA CGCCCCGAAA 

1301 TGAAAGGCAC GGAAGTCATC ATCGTCGGCG ACCATCCGCC GCCCGTCGGC 

1351 AACCTCAATG AAACCTTCCG CTACCTCAAA CAGGGGCACG TCGNCTGGCT 

1401 GAACTTCAAA ATCAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 472>: 



1 MNIHTLLSKQ WTLPPFLPKR LLLSLLILLX PNAVFWVLAL LTATA RPIVN 
, FDGLMMVI Q L FPFMDLIGAI 
, LLYMLAMPFV L QKAAAKTDF R HIAACAAW 
151 VAAGYFTG HL SXYDRGRMAN IFGANNFYYA KSQAMLYTVS QNADFITAGL 
201 VDPVFLPLGN QQRAATHLNE PK3QKILFIV AESWGLPANP ELQNATFAKL 
251 LAQKXRFSVW ESGSFPFIGA TIEGEMRELC AYGGLRGFAL RRAPDEKFAR 
301 CLPNRLKQEG YATFAMHGAG SSLYDRFSWY PRAGFQEIKT AENLIGKKTC 
351 AIFGGVCDSE LFGEVSAXFK KHDKGLFYWM TLTSHADYPE SDIFNHRLKC 
401 TEYGLPAETD XCRNFSLHTQ FFDQLADLIQ RPEMKGTEVI IVGDHPPPVG 
451 NLNETFRYLK QGHVXWLNFK IK* 

ORF48a and ORF48-1 show 96.8% identity in 472 aa overlap: 



orf48a.pep MNIHTLLSKQWTLPPFLPKRLLLSLLILLXPNAVFWVLALLTATARPIVNLXYLPAALLI 
orf4 8-l MNIHTLLSKQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 



10 



20 



30 



40 



50 



60 



70 80 90 100 110 120 

ALPWRXVKIXGVLAXWLAVLFDGLMMVIQLFPFMDLIGAINLVPFIXTAPALYQIMTGLL 

II III I II I I I I I I 

ALPWRFVKIAGVLAFWLAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGLL 

70 80 90 100 110 120 

130 140 150 160 170 180 

LLYMLAMPFVLQKAAAKTDFRHIAACAAVWAAGYFTGHLSXYDRGRMANIFGANNFYYA 

I II I I I I I I I I I I I I Illl: : I I I I I III I 

LLYMLAMPFVLQKAAAKTDFRHIAVCAAWAAAGYFTGHLSYYDRGRMANIFGANNFYYA 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf48a.pep KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATHLNEPKSQKILFIVAESWGLPANP 

III Ill I 

orf48-l KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATHLNEPKSQKILFIVAESWGLPANP 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf48a.pep ELQNATFAKLLAQKXRFSVWE3GSFPFIGATIEGEMRELCAYGGLRGFALRRAPDEKFAR 

I I I I I I I I : I I I I I I I I I I 

orf48-l ELQNATFAKLLAQKDP.FSVWESGSFPFIGATVEGEMRELCAYGGLRGFALRRAPDEKFAR 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf48a.pep CLPNRLKQEGYATFAMHGAGS3LYDRFSWYPRAGFQEIKTAENLIGKKTCAIFGGVCDSE 

I I I I I I I I I I I I I I I I I I I I I 

orf48-l CLPNRLKQEG YATFAMHGAGSSLYDRFSWYPRAGFQEIKTAENLIGKKTCAIFGGVCDSE 

310 320 330 340 350 360 
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370 380 390 400 410 420 

LFGEVSAXFKKHDKGLFYWMTLTSHADYPESDIFNHRLKCTEYGLPAETDXCRNFSLHTQ 

LFGEVSAFFKKHDKGLFYWMTLTSHADYPESDIFNHRLKCTEYGLPAETDLCRNFSLHTQ 
370 380 390 400 410 420 

430 440 450 460 470 

FFDQLADLIQRPEMKGTEVIIVGDHPPPVGNLNETFRYLKQGHVXWLNFKIKX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
FFDQLADLIQRPEMKGTEVIIVGDHPPPVGNLNETFRYLKQGHVAWLNFKIKX 
430 440 450 460 470 

Homology with a predicted ORF from ~N .gonorrhoeae 

ORF48 shows 97.5% identity over a 119aa overlap with a predicted ORF (ORF48ng) from N. 
gonorrhoeae: 

orf 48 .pep MNIHTLLSKQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 60 

1111:111:11 I I I I I I I I I I I I I I I 

orf48ng MNIHALLSEQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 60 

orf 48 - pep ALPWRFVKIAGVLAFWLAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGL 119 

Ml MINIMI MM 

orf4 8ng ALPHRFVKIAGVLAFWPAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGLL 120 

The ORF48ng nucleotide sequence <SEQ ED 473> was predicted to encode a protein having amino 
acid sequence <SEQ ED 474>: 

1 MNIHALLSEQ WTLPPFLPKR LLLSLLILLA PNAVFWVLAL LTATA RPIVN 
51 LDYLPAALLI ALFWRFVKIA G VLAFWPAVI FT' "I MMVI Q L FPFMDLIGAI 
101 NLVPFILTAP APYQ IMTGLL LLYMLAMPFV L QKAAVKTDF RHIAVCAAW 
151 AAARYFTGPF ELLRTGGRWQ YVQHRRLLLS GSRASFRRRQ KADVLRRLGN 
201 PYASMGNGG. . 

Further work identified the complete gonococcal DNA sequence <SEQ ID 475>: 

1 ATGAATATTC ACGCCCTGCT CTCCGAACAA TGGACGCTGC CGCCATTCCT 

51 GCCGAAACGG CTGCTGCTGT CCCTGCTGAT ACTGCTGGCC CCCAATGCGG 

101 TGTTTTGGGT TTTGGCACTG CTGACCGCCA CCGCCCGCCC GATTGTCAAT 

151 TTGGACTACC TTCCCGCCGC GCTGCTGATC GCCCTGCCTT GGCGTTTCGT 

201 CAAAATTGCC GGCGTATTGG CGTTTTGGCC GGCGGTTTTG TTTGACGGGC 

251 TGATGATGGT GATCCAACTC TTCCCTTTTA TGGACCTCAT CGGCGCCATC 

301 AACCTCGTCC CCTTCATCCT GACCGCCCCC GCCCCTTATC AGATAATGAC 

351 CGGGCTGTTG CTGCTGTATA TGCTGGCGAT GCCGTTTGTG TTGCAAAAAG 

401 CCGCCGTCAA AACCGACTTC CGACACATTG CCGTCTGTGC CGCCGTTGTG 

451 GCGGCAGCCG GCTATTTCAC CGGCCATTTG AGTTACTACG ACCGGGGGCG 

501 GATGGCCAAT ATCTTCGGCG CAAACAACTT CTATTACGCc aAAAGTCAGG 

551 CGATGCTCTA CACCGTCAGC CAGAATGCCG ACTTTATTAC CGCCGgcctG 

601 GTCGACCCCG TCTTCCTCCC CTTGGGCAAT CAGCAGCGTG CCGCCACGCG 

651 GCTGAGTGAG CCGAAATCTC AAAAAATCCT CTTTATCGTC GCCGAATCTT 

701 GGGGGCTGCC GGGCAATCCC GAGCTTCAAA ACGCCACTTT TGCCAAACTG 

751 CTGGCGCAAA AAGACCGTTT TTCGGTTTGG GAAAGCGGCA GTTTTCCCTT 

801 CATCGGCGCG ACGGTCGAAG GCGAAATGCG CGAATTGTGC GCCTACGGCG 

851 GTTTGCGCGG GTTCGCACTG CGCCGCGCGC CCGACGAAAA ATTTGCCCGC 

901 TGCCTCCCCA ACCGTTTGAA ACAAGAAGGT TACGCCACCT TTGCGATGCA 

951 CGGCGCGGGT AGTTCGCTTT ACGACCGCTT CAGCTGGTAT CCGAGGGCGG 

1001 GCTTTCAAAA AATCAAAACC GCCGAAAACC TGATCGGTAA AAAAACCTGC 

1051 GCCATTTTCG GCGGCGTGTG CGACAGCGAG CTGTTCGGCG AAGTGTCGGC 

1101 ATTTTTCAAA AAACACGACA AGGGACTGTT TTACTGGATG ACGCTGACCA 

1151 GCCACGCCGA CTATCCCGAA TCCGACATTT TCAACCACAG GCTCAAATGC 

1201 ACCGAATACG GCCTGCCCGC CGAAACCGAC CTCTGCCGCA ATTTCAGCCT 

1251 GCACACCCAA TtCttcgACC AACTGGCGGA TTTGATCCGA CGCCCCGAAA 

1301 TGAAAGGCAC GGAAGTCATC ATCGTCGGCG ACCATCCGCC GCCCGTCGGC 

1351 AACCTCAATG AAACCTTCCG CTACCTCAAA CAGGGACACG TCGCCTGGCT 

14 01 GCACTTCAAA ATCAAATAA 



orf 48a .pep 
orf48-l 

orf 48a. pep 
orf 48-1 



This encodes a protein having amino acid sequence <SEQ ED 476; ORF48ng-l>: 



WO 99/24578 



-281- 



PCT/IB98/01665 



1 MNIHALLSEQ WTLPPFLPKR LLLSLLILLA PNAVFWVLAL LTATARPIVN 

51 LDYLPAALLI ALPWRFVKIA GVLAFWPAVL FDGLMMVIQL FPFMDLIGAI 

101 NLVPFILTAP APYQIMTGLL LLYMLAMPFV LQKAAVKTDF RHIAVCAAW 

151 AAAGYFTGHL SYYDRGRMAN IFGANNFYYA KSQAMLYTVS QNADFITAGL 

5 2 01 VDPVFLPLGN QQRAATRLSE PKSQKILFIV AESWGLPGNP ELQNATFAKL 

251 LAQKDRFSVW ESGSFPFIGA TVEGEMRELC AYGGLRGFAL RRAPDEKFAR 

301 CLPNRLKQEG YATFAMHGAG SSLYDRF3WY PRAGFQKIKT AENLIGKKTC 

351 AIFGGVCDSE LFGEVSAFFK KHDKGLFYWM TLTSHADYPE SDIFNHRLKC 

4 01 TEYGLPAETD LCRNFSLHTQ FFDQLADLIR RPEMKGTEVI IVGDHPPPVG 

10 4 51 NLNETFRYLK QGHVAWLHFK IK* 

ORG48ng-l and ORF48-1 show 97.9% identity in 472 aa overlap: 

10 20 30 40 50 60 

orf 4 8-1. pep MNIHTLLSKQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 

I I I I: I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

15 orf4 8ng-l MNIHALLSEQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 



LLYMLAMPFVLQKAAAKTDFRHIAVCAAVVAAAGYFTGHLSYYDRGRMAN I FGANN FYYA 

LLYMLAMPFVLQECAAVKTDFRHIAVCAAVVAAAGYFTGHLSYYDRGRMANIFGANNFYYA 
130 140 150 160 170 180 



KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATHLNEPKSQKILFIVAESWGLPANP 

KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATRLSEPKSQKILFIVAESWGLPGNP 
190 200 210 220 230 240 



orf 48-1 .pep 



orf 48-1 .pep 
orf48ng-l 



orf48-l .pep 
orf48ng-l 



orf48-l.pep 
orf 48ng-l 



CLPNRLKQEGYATFAMHGAGSSLYDRFSWYPRAGFQEIKTAENLIGKKTCAIFGGVCDSE 

II I I I I I I I I I I I I I I I : I I I I I I I I II 

CLPNRLKQEGYATFAMHGAGSSLYDRFSWYPRAG FQKIKTAENLIGKKTCAIFGGVCDSE 



310 



320 



330 



340 



350 



360 



370 380 390 400 410 420 

LFGEVSAFFKKHDKGLFYWMTLTSHADYPESDIFNHRLKCTEYGLPAETDLCRNFSLHTQ 

LFGEVSAFFKKHDKGLFYWMTLTSHADYPESDIFNHRLKCTEYGLPAETDLCRNFSLHTQ 
370 380 390 400 410 420 

430 440 450 460 470 

FFDQLADLIQRPEMKGTEVIIVGDHPPPVGNLNETFRYLKQGHVAWLNFKIKX 

I I I I I I II I : I I I I I I I I I I I I I I I I I I ! I 11 I I I I I 11111:11111 

FFDQLADLIRRPEMKGTEVIIVGDHPPPVGNLNETFRYLKQGHVAWLHFKIKX 

430 440 450 460 470 



Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and two putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 



useful antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 57 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 477>: 

1 . . GTGAGCGGAC GTTACCGCGC TTTGGATCGC GTTTCCAAAA TCATCATCGT 

51 TACTTTGAGT ATCGCCACGC TTGCCGCCGC CGGCATCGCT ATGTCGCGCG 

101 GTATGCAGAT GCAGTCCGAT TTTATCGAGC CGACACCGTG GACGCTTGCC 

151 GGTTTGGGCT TCCTGATCGC GCTGATGGGC TGGATGCCCG CGCCGATTGA 

201 AATTTCCGCC ATCAATTCTT TGTGGGTAAC CGAAAAACAA CGCATCAATC 

251 CTTCCGAATA CCGCGACGGG ATTTTTGAAT TCAACGTCGG TTATATCGCC 

301 AGTGCGGTTT TGGCTTTGGT TTTCCTTGCA CTGGGCGC . G TAGCGCCGAA 

351 CGGCAACGGC GA.ACAGTGC AGATGGCGGG CGGCAAATAT AACGGGCAAT 

401 TGATCAATAT GTACGCC . . 

This corresponds to the amino acid sequence <SEQ ID 478; ORF53>: 

1 . . VSGRYRALDR VSKIIIVTLS IATLAAAGIA MSRGMQMQSD FIEPTPWTLA 
51 GLGFLIALMG WMPAPIEISA INSLWVTEKQ RINPSEYRDG IFEFNVGYIA 

101 SAVLALVFLA LGXVAPNGNG XTVQMAGGKY NGQLINMYA. . 

Further work revealed the complete nucleotide sequence <SEQ ID 479>: 



1 ATGTCCGAAC AACATATTTC GACTTGGAAA AGTAAAATCA ACGCATTGGG 

51 TCCGGGGATC ATGATGGCTT CGGCGGCGGT CGGCGGTTCG CACCTGATTG 

101 CCTCGACGCA GGCGGGCGCG CTTTACGGCT GGCAGATCGC GCTCATCATC 

151 ATCCTGACCA ACCTCTTCAA ATACCCGTTT TTCCGCTTCA GCGCGCATTA 

201 CACGCTGGAC ACGGGCAAGA GCCTGATTGA AGGTTATGCC GAGAAAAGCC 

251 GCGTTTATTT GTGGGTATTC CTGATTTTGT GCATCCTCTC CGCCACGATT 

301 AACGCGGGCG CGGTCGCCAT TGTAACCGCC GCCATCGTCA AAATGGCGAT 

351 TCCCTCGCTG ATGTTTGATG CCGGCACGGT TGCCGCCTTG ATTATGGCAT 

401 CCTGCCTGAT TATTTTGGTG AGCGGACGTT ACCGCGCTTT GGATCGCGTT 

4 51 TCCAAAATCA TCATCGTTAC TTTGAGTATC GCCACGCTTG CCGCCGCCGG 

501 CATCGCTATG TCGCGCGGTA TGCAGATGCA GTCCGATTTT ATCGAGCCGA 

551 CACCGTGGAC GCTTGCCGGT TTGGGCTTCC TGATCGCGCT GATGGGCTGG 

601 ATGCCCGCGC CGATTGAAAT TTCCGCCATC AATTCTTTGT GGGTAACCGA 

651 AAAACAACGC ATCAATCCTT CCGAATACCG CGACGGGATT TTTGATTTCA 

701 ACGTCGGTTA TATCGCCAGT GCGGTTTTGG CTTTGGTTTT CCTTGCACTG 

751 GGCGCGTTTG TGCAATACGG CAACGGCGAA GCAGTGCAGA TGGCGGGCGG 

801 CAAATATATC GGGCAATTGA TCAA7ATGTA CGCCGTTACC ATCGGCGGCT 

851 GGTCGCGCCC GCTGGTGGCG TTTATCGCGT TTGCCTGTAT GTACGGCACG 

901 ACGATTACCG TCGTGGACGG CTATGCCCGT GCCATTGCCG AACCCGTGCG 

951 CCTGCTGCGC GGAAAAGACA AAACGGGCAA CGCCGAATTC TTTGCCTGGA 

1001 ATATTTGGGT GGCGGGCAGC GGTTTGGCGG TGATTTTCTG GTTTGACGGC 

1051 GTAATGGCGA ATCTGCTCAA ATTTGCGATG ATTGCCGCTT TTGTGTCCGC 

1101 CCCTGTGTTT GCCTGGCTGA ATTACCGTTT GGTTAAAGGT GATGAAAAAC 

1151 AC AAACT CAC ATCAGGTATG AATGCCCTTG CATTGGCAGG CTTGATTTAT 

1201 CTGACCGGTT TTACCGTTTT GTTCTTATTG AATTTGGCGG GAATGTTCAA 

1251 ATGA 

This corresponds to the amino acid sequence <SEQ ID 480; ORF53-l>: 



1 MSEQHISTWK SKINALGPGI MMASAAVGGS HLIASTQAG A LYGWQIALII 

51 ILTNLF KYPF FRFSAHYTLD TGKSLIEGYA EKSRVYLW VF LILCILSATI 

101 NAGAV AIVTA AIVKMAIPSL MFD AGTVAAL IMASCLIILV SGRYRALDRV 

151 SK IIIVTLSI ATLAAAGIAK SRGMQMQSDF IEPT P OTT LAG LGFLIALMGW 

201 MPAPIEISAI NSLWYTEKQR INPSEYRDGI FDFNVGY IAS AVLALVFLAL 

251 GAFVQYGNGE AVQMAGGKYI GQLINMYAVT IGGWSRPL VA FIAFACMYGT 

301 TITW DGYAR AIAEPVRLLF. GKDKTGNAS F FAWNIWVAGS GLAVIF WFDG 

351 VMAN LLKFAM IAAFVSAPVF A WLNYRLVKG DEKHKLTSGM NA LALAGLIY 

4 01 LTGFTVLFL L NLAGMFK* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORE from N. meningitidis (strain A) 

ORF53 shows 93.5% identity over a 139aa overlap with an ORF (ORF53a) from strain A ofN. 
meningitidis: 
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VSGRYRALDRVSK IIIVTLSIATLAAAGIA 



or f 53 . pep MSRGMQMQSDFIEPTPKT - 'I r_j z> I TGWMPA PIEISAINSLWVTEKQRINPSEYRDG 



The complete length ORF53a nucleotide sequence <SEQ ID 481> is: 

1 ATGTCCGAAC AACATATTTC GACTTGGAAA AGTAAAATCA ACGCATTGGG 

51 ACCGGGGATT ATGATGGCTT CGGCGGCGGT CGGCGGTTCG CACCTGATTG 

101 CCTCGACGCA GGCGGGCGCG CTTTACGGCT GGCAGATCGC GCTCATCATC 

151 ATCCTGACCA ACCTCTTCAA ATACCCGTTT TTCCGCTTCA GCGCGCATTA 

201 CACGCTGGAC ACGGGCAAGA GCCTGATTGA AGGTTATGCC GAGAAAAGCC 

251 GCGTTTATTT GTGGGTATTC CTGATTTTGT GCATCCTCTC CGCCACGATT 

301 AACGCGGGCG CGGTCGCCAT TGTAACCGCC GCCATCGTCA AAATGGCGAT 

351 TCCCTCGCTG ATGTTTGATG CCGGCACGGT TGCCGCCTTG ATTATGGCAT 

4 01 CCTGCCTGAT TATTTTGGTG AGCGGACGTT ACCGCGCTTT GGATCGCGTT 

4 51 TCCAAAATCA TCATCGTTAC TTTGAGTATC GCCACGCTTG CCGCCGCCGG 

501 CATCGCTATG TCGCGCGGTA TGCAGATGCA GTCCGATTTT ATCGAGCCGA 

551 CACCGTGGAC GCTTGCCGGT TTGGGCTTCC TGATCGCGCT GATGGGCTGG 

601 ATGCCCGCGC CGATTGAAAT TTCCGCCATC AATTCTTTGT GGGTAACCGA 

651 AAAACAACGC ATCAATCCTT CCGAATACCG CGACGGGATT TTTGATTTCA 

701 ACGTCGGTTA TATCGCCAGT GCGGTTTTGG CTTTGGTTTT CCTTGCACTG 

751 GGCGCGTTTG TGCAATACGG CAACGGCGAA GCAGTGCAGA TGGCGGGCGG 

801 CAAATATATC GGGCAATTGA TCAATATGTA CGCCGTTACC ATCGGCGGCT 

851 GGTCGCGCCC GCTGGTGGCG TTTATCGCGT TTGCCTGTAT GTACGGCACG 

901 ACGATTACCG TTGTGGACGG CTATGCCCGT GCCATTGCCG AACCCGTGCG 

951 CCTGCTGCGC GGAAAAGACA AAACGGGCAA CGCCGAATTC TTTGCCTGGA 

1001 ATATTTGGGT GGCGGGCAGC GGTTTGGCGG TGATTTTCTG GTTTGACGGC 

1051 GTAATGGCGA ATCTGCTCAA ATTTGCGATG ATTGCCGCTT TTGTGTCCGC 

1101 CCCTGTGTTT GCCTGGCTGA ATTACCGTTT GGTCAAAGGT GATGAAAAAC 

1151 ACAAACTCAC ATCAGGTATG AATGCCCTTG CATTGGCAGG CTTGATTTAT 

1201 CTGACCGGTT TTACCGTTTT GTTCTTATTG AATTTGGCGG GAATGTTCAA 

12 51 ATGA 

This encodes a protein having amino acid sequence <SEQ ID 482>: 

1 MSEQHISTWK SKINALGPGI MMASAAVGGS HLIASTQAG A LYGWQIALII 

51 ILTNLFKYPF FRFSAHYTLD TGKSLIEGYA EKSRVYLW VF LILCILSATI 

101 NAGAV AIVTA AIVKMAIPSL MFD AGTVAAL IMASCLIILV SGRYRALDRV 

151 SK IIIVTLSI ATLAAAGIA M SRGMQMQSDF IEPTPW TLAG LGFLIALMGW 

201 MPAPIEISAI NSLWVTEKQR INPSEYRDGI FDFNVGY IAS AVLALVFLAL 

251 GAFV QYGNGE AVQMAGGKYI GQLINMYAVT IGGWSRPL VA FIAFACMYGT 

301 TITW DGYAR AIAEPVRLLR GKDKTGHAE F FAWNIWVAGS GLAVIF WFDG 

351 VMAN LLKFAM IAAFVSAPVF A WLNYRLVKG DEKHKLTSGM NA LALAGLIY 

4 01 LTGFTVLFL L NLAGMFK* 

ORF 53a shows 100.0% identity in 417 aa overlap with ORF53-1 : 

10 20 30 40 50 60 

orf53a.pep MSEQHISTWKSKINALGPGIMMASAAVGGSHLIASTQAGALYGWQIALI I ILTNLFKYPF 

orf53-l MSEQHISTWKSKINALGPGIMMASAAVGGSHLIASTQAGALYGWQIALI I ILTNLFKYPF 
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FRFSAHYTLDTGKS LIEGYAEKSRVYLWVFL I LCI LSAT INAGAVAIVTAAIVKMAI PSL 

FRFSAHYTLDTGKSLIEGYAEKSRVYLWVFLI LCI LSAT INAGAVAIVTAAIVKMAI PS L 
70 80 90 100 110 120 

130 140 150 160 170 180 

MFDAGTVAALIMASCLIILVSGRYRALDRVSKIIIVTLSIATLAAAGIAMSRGMQMQSDF 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I 1 I I I I I I I I I I I 
MFDAGTVAALIMASCLIILVSGRYRALDRVSKIIIVTLSIATLAAAGIAMSRGMQMQSDF 

130 140 150 160 170 180 

190 200 210 220 230 240 

IEPTPWTLAGLGFLIALMGWMPAPIEISAIKSLWVTEKQRINPSEYRDGIFDFNVGYIAS 

II I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
IEPTPWTLAGLGFLIALMGWMPAPIEISAIKSLWVTEKQRINPSEYRDGIFDFNVGYIAS 

190 200 210 220 230 240 

250 260 270 280 290 300 

AVLALVFLALGAFVQYGNGEAVQMAG3KYIGQLINMYAVTIGGWSRPLVAFIAFACMYGT 

AVLALVFLALGAFVQYGNGEAVQKAGGKYIGQLINMYAVTIGGWSRPLVAFIAFACMYGT 
250 260 270 280 290 300 

310 320 330 340 350 360 

TITWDGYARAIAEPVRLLRGKDKTGNAEFFAWNIWVAGSGLAVIFWFDGVMANLLKFAM 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 
TITWDGYARAIAEPVRLLRGKDKTGNAEFFAWNIWVAGSGLAVIFWFDGVMANLLKFAM 
310 320 330 340 350 360 

370 380 390 400 410 

IAAFV3APVFAWLNYRLVKGDEKHKLTSGMNALALAGLIYLTGFTVLFLLNLAGMFKX 

III IN I I I I I I I I I I I I I I I I I I I I I I I I 

IAAFVSAPVFAWLNYRLVKGDEKHKLTSGMNALALAGLIYLTGFTVLFLLNLAGMFKX 
370 380 390 400 410 

Homology with a predicted QRF from N.gonorrhoeae 

ORF53 shows 92.1% identity over a 139aa overlap with a predicted ORF (ORF53ng) from N. 
gonorrhoeae: 



orf 53 .pep 


VSGRYRALDRVSKIIIVTLSIATLAAAGIA 


30 


orf53ng 


AAIVKMAIPSLMFDAGTVAALIMASCLIILVSGRYRALDRVSKII IVTLSIATLAAAGIA 


91 


orf 53 .pep 


MSRGMQMQSDFIEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDG 
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 


90 


orf53ng 


MSRGMQMQPDFIEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDG 


151 


orf 53 .pep 
orf53ng 


IFEFNVGYIASAVLALVFLALGXVAPNGNGXTVQMAGGKYNGQLINMYA 
1 1:1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II : 1 1 1 : 1 1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 
IFDFNVGYIASAVLALVFLALGAFVQYGNGEAVQMGGGKYIGQLINMYAVTIGGGSRPLV 


139 

211 



An ORF53ng nucleotide sequence <SEQ ID 483> was predicted to encode a protein having amino 
acid sequence <SEQ ID 484>: 

1 MPKKSCVYLW VFLILCIASA TINAGAVAIV TAAIVKMAIP SLMFDAGTVA 

51 ALIMASCLI I LVSGRYRALD RVSK IIIVTL S I . A T L A AA G I AM5RGMQMQP 

101 DFIEPTPW TL AGLGFLIALM GWMPA PIEIS AINSLWVTEK QRINPSEYRD 

151 GIFDFNVGY I ASAVLALVFL ALGAFV QYGN GEAVQMGGGK YIGQLINMYA 

201 VTIGGGSRPL VAFIAFACMY GAASTW DGY ARAIAEPVRL LRGKDKTARP 

251 IVLLEKLGGR HRFGRDFLV* 

Further analysis revealed further partial DNA gonococcal sequence <SEQ ID 485>: 

1 . . aagaAAAGCT GCGTTTATTT GTGGGTTTTT TTGATTTTGT GTATCGCCTC 
51 CGCCACGATT AACGCGGGCG CGGTCGCCAT TGTAACCGCC GCCATCGTCA 

101 AAATGGCGAT TCCCTCGCTG ATGTTTGATG CCGGCACGGT TGCCGCCTTG 



orf 53a . pep 
orf53-l 

orf53-l 

orf53a.pep 
orf53-l 

orf 53a . pep 
orf53-l 

orf 53a. pep 
orf53-l 

orf 53a. pep 
orf53-l 
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ATTATGGCAT 
GGATCGTGTT 
CCGCCGCCGG 
ATCGAGCCGA 
GATGGGCTGG 
GGGTAACCGA 
TTCGATTTCA 
CCTTGCACTG 
TGGCGGGCGG 
ATCGGCGGCT 
GTACGGCACG 
AACCCGTGCG 
TTtgccTGGA 
GTTTGACggc 
TTGTGTCCGC 
GACAAACGCC 
CCTGCTCTAC 
GACTTTTGGC 



CCTGCCTGAT 
TCCAAAATCA 
CATCGCTATG 
CACCGTGGAC 
ATGCCCGCGC 
AAAACAACGC 
ACGTCGGTTA 
GGCGCGTTTG 
CAAATATATC 
GGTCTCGTCC 
ACGATTACCG 
CCTGCTGCGC 
ATATTTGGGT 
gcaaTGGCgG 
CCCTGTGTTC 
ACAGGCTTAC 
CTGGCCGGGT 
ATAG 



TATTTTGGTG 
TCATTGTTAC 
TCGCGCGGTA 
GCTTGCCGGT 
CGATCGAAAT 
ATCAATCCTT 
TATCGCcagT 
TGCAATACGG 
GGGCAATTGA 
GCTGGTGGCG 
TTGTGGACGG 
GGCAGGGATA 
GGCGGGCAGC 
AACtgcTCAA 
GCCTGGCTCA 
CGCCGGTAT3 



AGCGGACGTT 
TTTGAGCATC 
TGCAGATGCA 
TTGGGCTTCC 
TTCCGCCATC 
CTGAATACCG 
GCGGTTTTGG 
CAACGGCGAA 
TTAATATGTA 
TTTATCGCGT 
TTATGCGCGT 
AAACCGGCAA 
GGTTTGGCGG 
ATTTGCGATG 
ACTACCGCCT 
AACGCCCTTG 
GTTCCTGTTG 



ACCGCGCTTT 
GCCACGCTTG 
GCCCGATTTT 
TGATCGCGCT 
AATTCTTTGT 
CGACGGGATT 
CTTTGGTTTT 
GCAGTGCAGA 
TGCCGTAACC 
TTGCCTGTAT 
GCCATTGCCG 
CGCCGAGTTG 
TGATTTTCTG 
ATtgccgcCT 
CGTCAAAGGG 
CCATTGTCGG 
AACCTTACCG 



This corresponds to the amino acid sequence <SEQ ID 486; ORF53ng-l>: 



1 . .KKSCVYLWVF LILCIASATI NAGAVAIVTA AIVKMAIPSL MFDAGTVAAL 

51 IMASCLIILV SGRYRALDRV SK II IVTLSI ATLAAAGIAM SRGMQMQPDF 

101 IEPTPW TLAG LGFLIALMGW MPA PISISAI NSLWVTEKQR INPSEYRDGI 

151 FDFNVGY IAS AVLALVFLAL GAFV QYGNGE AVQMAGGKYI GQLINMYAVT 

201 IGGWSRPL VA FIAFACMYGT TITW DGYAR AIAEPVRLLR GRDKTGNAEL 

251 FAWNIWVAGS GLAVIFW FDG AMAE LLKFAM IAAFVSAPVF A WLNYRLVKG 

301 DKRHRLTAGM HA LAIVGLLY LAGFAVLFL L NLTGLLA* 

ORF53ng-l and ORF53-1 show 94.0% identity in 336 aa overlap: 

60 70 80 90 100 110 

ILTNLFKYPFFRFSAHYTLDTGKSLIEGYAEKSRVYLWVFLILCILSATINAGAVAIVTA 
: I I I I I I I I I I I I I I I I I I I I I I I I I I I 
KKSCVYLWVFLILCIASATTNAGAVATVTA 
10 20 30 

120 130 140 150 160 170 

AI VKMAIPSLMFDAGTVAALIMASCLI I LVSGRYRALDRV SKI I IVTLSI ATLAAAGIAM 

II I II I I I I I I II I I I I Ill 

AIVKMAIPSLMFDAGTVAALIMASCLIILVSGRYRALDRVSKI IIVTLSIATLAAAGIAM 
40 50 60 70 80 90 



orf 53-1 .pep 
orf 53ng-l 

orf53-l.pep 
orf53ng-l 



180 190 200 210 220 230 

orf 53-1. pep SRGMQMQSDFIEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDGI 

I I I I I I I 1 II I I I I I I I I I I I Ill 

orf53ng-l SRGMQMQPDFIEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDGI 

100 110 120 130 140 150 

240 250 260 270 280 290 

orf 53-1 . pep FDFNVGYIASAVLALVFLALGAFVQYGNGEAVQMAGGKYIGQLINMYAVT IGGWSRPLVA 

I I I I I I I I I I I I I I I I I I I I I I I I I II 

orf53ng-l FDFNVGYIASAVLALVFLALGAFVQYGNGEAVQMAGGKYIGQLINMYAVT IGGWSRPLVA 

160 170 180 190 200 210 

300 310 320 330 340 350 

orf 53-1. pep FIAFACMYGTTITVVDGYARAIAEPVRLLRGKDKTGNAEFFAWNIWVAGSGLAVIFWFDG 

I I I I II I I I I I I I I I I I I I I I ! I I I I : I I I I I I I : I I I I I I I I I I 

orf53ng-l FIAFACMYGTTITWDGYARAIAEPVRLLRGRDKTGNAELFAWNIWVAGSGLAVIFWFDG 

220 230 240 250 260 270 

360 370 380 390 400 410 

orf 53-1. pep VMANLLKFAMIAAFVSAPVFAWLNYRLVKGDEKHKLTSGMNALALAGLIYLTGFTVLFLL 

orf53ng-l AMAELLKFAMIAAFVSAPVFAWLNYRLVKGDKRHRLTAGMNALAIVGLLYLAGFAVLFLL 
280 290 300 310 320 330 



orf 53-1. pep NLAGMFKX 
orf53ng-l NLTGLLAX 
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Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and JV. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 58 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 487>: 

1 . .TTGCGGGAAA CGGCATATGT TTTGGATAGT TTTGATCGTT ATTTTGTTGT 
51 TGCGCTTGCC GGCTTGTTTT TTGTCCGCGC ACAATCCGAA CGCGAGTGGA 
101 TGCGCGAGGT TTCTGCGTGG CAGGAAAAGA AAGGGGAAAA ACAGGCGGAG 
151 CTGCCTGAAA TCAAAGACGG TATGCCCGAT TTTCCCGAAC TTGCCCTGAT 

201 GCTTTTCCAC GCCGTCAAAA CGGCAGTGTA TTGGCTGTTT GTCGGTGTCG 
251 TCCGTTTCTG CCGAAACTAT CTGGCGCACG AATCCGAACC GGACAGGCCC 
301 GTTCCGCCT . . 

This corresponds to the amino acid sequence <SEQ ID 488; ORF58>: 

1 . . LRETAYVLDS FDRYFWALA GIFFVRAQSE REWMREVSAW QEKKGEKQAE 
51 LPEIKDGMPD FPELALML FK AVKTAVYWLF VGW RFCRNY LAHESEPDRP 
101 VPP. . 

Further work revealed the complete nucleotide sequence <SEQ ID 489>: 

1 ATGTTTTGGA TAGTTTTGAT CGTTATTTTG TTGCTTGCGC TTGCCGGCTT 

51 GTTTTTTGTC CGCGCACAAT CCGAACGCGA GTGGATGCGC GAGGTTTCTG 

101 CGTGGCAGGA AAAGAAAGGG GAAAAACAGG CGGAGCTGCC TGAAATCAAA 

151 GACGGTATGC CCGATTTTCC CGAACTTGCC CTGATGCTTT TCCATGCCGT 

201 CAAAACGGCA GTGTATTGGC TGTTTGTCGG TGTCGTCCGT TTCTGCCGAA 

2 51 ACTATCTGGC GCACGAATCC GAACCGGACA GGCCCGTTCC GCCTGCTTCT 

301 GCAAACCGTG CGGATGTTCC GACCGCATCC GACGGATATT CAGACAGTGG 

351 AAACGGGACG GAAGAAGCGG AAACGGAAGA AGCAGAAGCT GCGGAGGAAG 

4 01 AGGCTGCCGA TACGGAAGAC ATTGCAACTG CCGTAATCGA CAACCGCCGC 

4 51 ATCCCATTCG ACCGGAGTAT TGCTGAAGGG TTGATGCCGT CTGAAAGCGA 

501 AATTTCGCCC GTCCGTCCGG TTTTTAAAGA AATCACTTTG GAAGAAGCAA 

551 CGCGTGCTTT AAACAGCGCG GCTTTAAGGG AAACGAAAAA ACGCTATATC 

601 GATGCATTTG AGAAAAACGA AACAGCGGTC CCCAAAGTCC GCGTGTCCGA 

651 TACCCCGATG GAAGGGCTGC AGATTATCGG TTTGGACGAC CCTGTGCTTC 

7 01 AACGCACGTA TTCCCATATG TTCGATGCGG AC AAAGAAG C GTTTTCCGAG 
751 TCTGCGGATT ACGGATTTGA GCCGTATTTT GAGAAGCAGC ATCCGTCTGC 
801 CTTTTCTGCA GTCAAAGCCG AAAATGCACG GAATGCGCCG TTCCACCGTC 

8 51 ATGCAGGGCA GGGGAAAGGG CAGGCGGAGG CAAAATCCCC GGATGTTTCC 
901 CAAGGGCAGT CCGTTTCAGA CGGCACGGCC GTCCGCGATG CCCGCCGCCG 
951 CGTTTCCGTC AATTTGAAAG AACCGAACAA GGCAACGGTT TCTGCGGAGG 

1001 CGCGAATTTC TCGCCTGATT CCGGAAAGTC AGACGGTTGT CGGGAAACGG 

1051 GATGTCGAAA TGCCGTCTGA AACCGAAAAT GTTTTCACGG AAACCGTTTC 

1101 GTCTGTGGGA TACGGCGGTC CGGTTTATGA TGAAACTGCC GATATCCATA 

1151 TTGAAGAACC TGCCGCGCCC GATGCTTGGG TGGTCGAACC ACCCGAAGTG 

1201 CCGAAAGTTC CCATGACCGC AATCGATATT CAGCCGCCGC CTCCCGTATC 

1251 GGAAATCTAC AACCGTACCT ATGAACCGCC GTCAGGATTC GAGCAGGTGC 

1301 AACGCAGCCG CATTGCCGAG ACCGACCATC TTGCCGATGA TGTTTTGAAT 

1351 GGAGGTTGGC AGGAGGAAAC CGCCGCTATT GCGGATGACG GCAGTGAAGG 

1401 TGCGGCAGAG CGGTCAAGCG GGCAATATCT GTCGGAAACC GAAGCGTTCG 

1451 GGCATGACAG TCAGGCGGTT TGTCCGTTTG AAAATGTGCC GTCTGAACGC 

1501 CCGTCCTGCC GGGTATCGGA TACGGAAGCG GATGAAGGGG CGTTCCCATC 

1551 TGAAGAAACC GGTGCGGTAT CCGAACACCT GCCGACAACC GACCTGCTTC 

1601 TGCCTCCGCT GTTCAATCCC GAGGCGACGC AAACCGAAGA AGAACTGTTG 

1651 GAAAACAGCA TCACCATCGA AGAAAAATTG GCGGAGTTCA AAGTCAAGGT 

1701 CAAGGTTGTC GATTCTTATT CCGGCCCCGT AATTACGCGT TATGAAATCG 

1751 AACCCGATGT CGGCGTGCGC GGCAATTCCG TTCTGAATCT GGAAAAAGAT 

1801 TTGGCGCGTT CGCTCGGCGT GGCTTCCATC CGCGTTGTCG AAACCATCCC 

1851 CGGCAAAACC TGCATGGGTT TGGAACTTCC GAACCCGAAA CGCCAAATGA 
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1901 TACGCCTGAG CGAAATCTTC AATTCGCCCG AGTTTGCCGA ATCCAAATCC 

1951 AAGCTGACGC TCGCGCTCGG TCAGGACATC ACCGGACAGC CCGTCGTAAC 

2001 CGACTTGGGA AAAGCACCGC ATTTGTTGGT TGCCGGCACG ACCGGTTCGG 

2051 GCAAATCGGT GGGTGTCAAC GCGATGATTC TGTCTATGCT TTTCAAAGCC 

2101 GCGCCGGAAG ACGTGCGTAT GATTATGATC GAT CCGAAAA TGCTGGAATT 

2151 GAGCATTTAC GAAGGCATCC CGCACCTGCT CGCCCCTGTC GTTACCGATA 

2201 TGAAGCTGGC GGCAAACGCG CTGAACTGST GTGTTAACGA AATGGAAAAA 

2251 CGCTACCGCC TGATGAGCTT TATGGGCGTG CGTAATCTTG CGGGCTTCAA 

2301 TCAAAAAATC GCCGAAGCCG CAGCAAGGGG AGAAAAAATC GGCAATCCGT 

2351 TCAGCCTCAC GCCCGACGAT CCCGAACCTT TGGAAAAACT GCCGTTTATC 

24 01 GTGGTCGTGG TCGATGAGTT TGCCGACCTG ATGATGACGG CAGGCAAGAA 

24 51 AATCGAAGAA CTGATTGCCC GCCTCGCCCA AAAAGCCCGC GCGGCAGGCA 

2501 TCCATTTGAT TCTTGCCACA CAACGCCCCA GCGTCGATGT CATCACGGGT 

2551 CTGATTAAGG CGAACATCCC GACGCGTATC GCGTTCCAAG TGTCCAGCAA 

2601 AATCGACAGC CGCACGATTC TCGACCAAAT GGGCGCGGAA AACCTGCTCG 

2651 GTCAGGGCGA TATGCTGTTC CTGCTGCCGG GTACTGCCTA TCCGCAGCGC 

27 01 GTTCACGGCG CGTTTGCCTC GGATGAAGAG GTGCACCGCG TGGTCGAATA 

2751 TTTGAAACAG TTTGGCGAAC CGGACTAT3T TGACGATATT TTGAGCGGCG 

2801 GCGGCAGCGA AGAGCTGCCC GGCATCGGGC GCAGCGGCGA CGACGAAACC 

2851 GATCCGATGT ACGACGAGGC CGTATCCGTT GTCCTGAAAA CGCGCAAAGC 

2 901 CAGCATTTCG GGCGTACAGC GCGCCTTGCG TATCGGCTAC AACCGCGCCG 

2951 CGCGTCTGAT TGACCAGATG GAGGCGGAAG GCATTGTGTC CGCACCGGAA 

3001 CACAACGGCA ACCGTACGAT TCTCGTCCCC TTGGACAATG CTTGA 

This corresponds to the amino acid sequence <SEQ ID 490; ORF58-l>: 

1 MFWIVLIVIL LLALAGLFFV RAQS EREWMR EVSAWQEKKG EKGAELPEIK 

51 DGMPDFPELA LM LFHAVKTA VYWLFVGVV R FCRNYLAHES EPDRPVPPAS 

101 ANRADVPTAS DGYSDSGNGT EEAETEEAEA AEEEAADTED IATAVIDNRR 

151 IPFDRSIAEG LMPSESEI3P VRPVFKEITL EEATRALNSA ALRETKKRYI 

201 DAFEKNETAV PKVRVSDTPM EGLQIIGLDD PVLQRTYSHM FDADKEAFSE 

251 SADYGFEPYF EKQHPSAFSA VKAENARNAP FHRHAGQGKG QAEAKSPDVS 

301 QGQSVSDGTA VRDARRRVSV NLKEPNKATV SAEARISRLI PESQTVVGKR 

351 DVEMPSETEN VFTETVSSVG YGGPVYDETA DIHIEEPAAP DAWVVEPPEV 

401 PKVPMTAIDI QPPPPVSEIY NRTYEPPSGF EQVQRSRIAE TDHLADDVLN 

451 GGWQEETAAI ADDGSEGAAE RSSGQYLSET EAFGHDSQAV CPFENVPSER 

501 PSCRVSDTEA DEGAFPSEET GAVSEHLPTT DLLLPPLFNP EATQTEEELL 

551 ENSIT1EEKL AE FKVKVKW DSYSGPVITR YEIEPDVGVR GNSVLNLEKD 

601 LARSLGVASI RWETIPGKT CMGLELPNPK RQMIRLSEIF NSPEFAESKS 

651 KLTLALGQDI TGQPVVTDLG KAPHLLVAGT TGSGKSVGVN AMILSMLFKA 

701 APEDVRMIMI DPKMLELSIY EGIPHLLAPV VTDMKLAANA LNWCVNEMEK 

751 RYRLMSFMGV RNLAGFNQKI AEAAARGEKI GNPFSLTPDD PEPLEK LPFI 

801 VWVDEFADL MMT AGKKIEE LIARLAQKAR AAGIHLILAT QRPSVDVITG 

851 LIKANIPTRI AFQVSSKIDS RTILDQMGAE NLLGQGDMLF LLPGTAYPQR 

901 VHGAFASDEE VHRWEYLKQ FGEPDYVDDI LSGGGSEELP GIGRSGDDET 

951 DPMYDEAVSV VLKTRKASIS GVQRALRIGY NRAARLIDQM EAEGIVSAPE 

1001 HNGNRTILVP LDNA* 

Computer analysis of this amino acid sequence predicts the indicated transmembrane region, and 
also gave the following results: 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF58 shows 96.6% identity over a 89aa overlap with an ORF (ORF58a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 58 .pep LRETAYVLDSFDRYF/V ALAGLFFVRAgS EREMMREVSAWQEKKGEKQAELPEIKDGMPD 
: :: I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I 
or f 5 8 a MFWIVLIVILLLALAGLFFVRAQS EREWMREVSAWQEKKGEKQAELPEIKDGMPD 



FPELALM LFHAVKTAVYWLFVGVV RFCRNYLAHESEPDRPVPP 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

FPELALM LFHAVKTAVYWLFVGW RFCRNYLAHESEPDRPVPPA5ANRADVPTASDGYSD 
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The complete length ORF58a nucleotide sequence <SEQ ID 491> is: 



751 
801 
851 



1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 



ATGTTTTGGA 
GTTTTTTGTC 
CGTGGCAGGA 
GACGGTATGC 
CAAAACGGCA 
ACTATCTGGC 
GCAAATCGTG 
AAACGGGACG 
AGGCTGCCGA 
ATCCCATTCG 
AATTTCGCCC 
CGCGTGCTTT 
GATGCATTTG 
TACCCCGATG 
AACGCACGTA 
TCTGCGGATT 
CTTTTCTGCA 
ATGCAGGGCA 
CAAGGGCAGT 
CGTTTCCGTC 
CGCGGATTTC 
GATGTCGAAA 
GTCTGTGGGA 
TTGAAGAACC 
CCGAAAGTTC 
GGAAATCTAC 
AACGCAGCCG 
GGAGGTTGGC 
TGTGGCAGAG 
GGCATGACAG 
CCGTCCCGCC 
TGAAGAAACC 
TGCCGCCGCT 
GANAACAGCA 
CAAGGTTGTC 
AACCCGATGT 
TTGGCGCGTT 
CGGCAAAACC 
TACGCCTGAG 
AAGCTGACGC 
CGftCTTGGGC 
GCAAATCGGT 
GCGCCGGAAG 
GAGCATTTAC 
TGAAGCTGGC 
CGCTACCGCC 
TCAAAAAATC 
TCAGCCTCAC 
GTGGTCGTGG 
AATCGAAGAA 
TCCATCTTAT 
CTGATTAAGG 
AATCGACAGC 
GGCAGGGCGA 
GTTCACGGCG 
TCTGAAACAG 
GTATGTCCGA 
GATCCGATGT 
CAGCATTTCT 
CGCGTCTGAT 
CACAACGGCA 



TAGTTTTGAT 
CGCGCACAAT 
AAAGAAAGGG 
CCGATTTTCC 
GTGTATTGGC 
GCACGAATCC 
CGGATGTTCC 
GAAGAAGCGG 
TACGGAAGAC 
ACCGGAGTAT 
GTCCGTCCGG 
AAACAGCGCG 
AGAAAAACGA 
GAAGGGCTGC 
TTCCCGTATG 
ACGGATTTGA 
GTCAAAGCCG 
GGGNAAAGGG 
CCGTTTCAGA 
AATTTGAAAG 
GCGCCTGATT 
TGCCGTCTGA 
TACGGCGNTC 
TGCCGCGCCC 
CCATGCCCGC 
AACCGTACCT 
CATTGCCGAA 
AGGAGGAAAC 
CGGTCAAGCG 
TCAGGCGGTT 
GGGCATNGGA 
GGTGCGGTAT 
GTTCAATCCC 
TCACCATCGA 
GATTCTTATT 
CGGCGTGCGC 
CGCTCGGCGT 
TGTATGGGTT 
CGAAATCTTC 
TCGCGCTCGG 
AAAGCACCGC 
GGGTGTCAAC 
ACGTGCGTAT 
GAAGGCATCC 
GGCAAACGCG 
TGATGAGCTT 
GCCGAAGCCG 
GCCCGACAAT 
TTGATGAGTT 
CTGATTGCCC 
CCTTGCCACA 
CGAACATCCC 
CGCACGATTC 
TATGCTGTTC 
CGTTTGCCTC 
TTTGGCGAAC 
CGATTTGCTG 
ACGACGAGGC 
GGCGTGCAGC 
TGACCAGATG 
ACCGTACGAT 



CGTTATTTTG 
CCGAACGCGA 
GAAAAACAGG 
CGAACTTGCC 
TGTTTGTCGG 
GAACCGGACA 
GACCGCATCC 
AAACGGAAGA 
ATTGCAACTG 
TGCTGAAGGG 
TTTTTAAGGA 
GCTTTAAGGG 
AACAGCGGTC 
AGATTATCGG 
TTCGATGCGG 
GCCGTATTTT 
AAAATGCACG 
CAGGCGGAGG 
CGGCACAGCC 
AACCGAACAA 
CCGGAAAGTC 
AACCGAAAAT 
CGGTTTATGA 
GATGCTTGGG 
AATNGATATT 
ATGAACCGCC 
AC C GAT CATC 
CGCCGCTATT 
GGCAATATTT 
TGTCCGTTTG 
TACGGAAGCG 
CCGAACACCT 
GGGGCGACGC 
AGAAAAATNG 
CCGGCCCCGT 
GGCAATTCCG 
GGCTTCCATC 
TGGAACTTCC 
AATTCGCCCG 
TCAGGACATC 
ATTTGTTGGT 
GCGATGATTC 
GATTATGATC 
CGCACCTGCT 
CTGAACTGGT 
TATGGGCGTG 
CAGCAAGGGG 
CCCGAACCTT 
TGCCGACCTG 
GCCTCGCCCA 
CAACGCCCCA 
GACGCGTATC 
TTGACCAAAT 
CTGCCGCCGG 
GGATGAAGAG 
CGGACTATGT 
GGAATCAGCC 
CGTGTCNGTT 
GCGCATTGCG 
GAGGCGGAAG 
TCTCGTCCCC 



TTGCTTGCGC 
GTGGATGCGC 
CGGAGCTGCC 
CTGATGCTTT 
TGTCGTCCGT 
GGCCCGTTCC 
GACGGATATT 
AGCAGAAGCT 
CCGTAATCGA 
TTGATGCCGT 
AATCACTTTG 
AAACGAAAAA 
CCCAAAGTCC 
TTTGGACGAC 
ACAAAGAAGC 
GAGAAGCAGC 
GAATGCGCCG 
CNAAATCCCC 
GTCCGCGATG 
GGCAACGGTT 
GGACGGTTGT 
GTTTTCACGG 
TGAAACTGCC 
TGGTCGAACC 
CCGCCGCCGC 
GGCAGGATTC 
TTGCCGATGA 
GCGAATGACG 
GTCGGAAACC 
AAAATGTGCC 
GATGAAGGGG 
GCCGACAACC 
AAACCGAAGA 
GCGGAGTTCA 
GATTACGCGT 
TTCTAAATCT 
CGCGTTGTCG 
GAACCCGAAA 
AGTTTGCCGA 
ACCGGACAGC 
TGCCGGCACG 
TGTCTATGCT 
GATCCGAAAA 
CGCCCCTGTC 
GTGTTAACGA 
CGCAATCTTG 
GGAGAAAATC 
TGGANAAATT 
ATGATGACGG 
AAAAGCCCGC 
GTGTCGATGT 
GCGTTCCAAG 
GGGTGCGGAA 
GTACGGCCTA 
GTGCACCGCG 
TGACGATATN 
GGAGCGGCGA 
GTTTTGAAAA 
TATCGGCTAT 
GCATTGTGTC 
TTNGACAATG 



TTGCCGGCTT 
GAGGTTTCTG 
TGAAAT CAAA 
TCCATGCCGT 
TTCTGCCGAA 
GCCTGCTTCT 
CAGACAGTGG 
GCGGAGGAAG 
CAACCGCCGC 
CTGAAAGCGA 
GAAGAAGCAA 
ACGCTATATC 
GCGTGTCCGA 
CCTGTGCTTC 
GTTTTCCGAG 
ATCCGTCTGC 
TTCCGCCGTC 
GGATGTTTCC 
CCNGCCGCCG 
TCTGCGGAGG 
CGGGAAACGG 
AAANTGTTTC 
GATATCCATA 
ACCCGAAGTG 
CTCCCGTATC 
GAGCAGGTGC 
TGTTTTGAAT 
GCAGTGAGGG 
GAAGCGTTCG 
GTCTGAACGC 
CGTTCCAATC 
GACCTGCTTC 
AGANCTGTTG 
AAGTCAAGGT 
TATGAAATCG 
GGAAAAAGAN 
AAACCATCCT 
CGCCAAATGA 
ATCCAAATCC 
CCGTCGTAAC 
ACCGGTTCGG 
TTTCAAAGCC 
TGCTGGAATT 
GTTACCGATA 
AATGGAAAAA 
CGGGTNTCAA 
GGCAACCCGT 
GCCGTTTATC 
CAGGCAAGAA 
GCGGCAGGCA 
CATCACGGGT 
TGTCCAGCAA 
AACCTGCTCG 
TCCGCAGCGC 
TGGTCGAATA 
TTGAGCGGCG 
CGGCGAAACC 
CGCGCAAAGC 
AATCGCGCCG 
CGCACCGGAA 
CTTGA 



This encodes a protein having amino acid sequence <SEQ ID 492>: 

1 MFWIVLIVIL LLALAGLFFV RAQS SREWMR EVSAWQEKKG EKQAELPEIK 

51 DGMPDFPELA LM L FHAVKTA VYWLFVGVV R FCRNYLAHES EPDRPVPPAS 

101 ANRADVPTAS DGYSDSGNGT EEAETEEAEA AEEEAADTED IATAVIDNRR 

151 IPFDRSIAEG LMPSESEISP VRPVFKEITL EEATRALNSA ALRETKKRYI 

201 DAFEKNETAV PKVRVSDTPM EGLQIIGLDD PVLQRTYSRM FDADKEAFSE 
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SADYGFEPYF 
QGQSVSDGTA 
DVEMPSETEN 
PKVPMPAXDI 
GGWQEETAAI 
PSRRAXDTEA 
XNSITIEEKX 
LARSLGVASI 
KLTLALGQDI 
APEDVRMIMI 
RYRLMS FMGV 
VVVVDEFADL 
LIKANIPTRI 
VHGAFASDEE 
DPMYDEAVSV 
HNGNRTILVP 



EKQHPSAFSA 
VRDAXRRVSV 
VFTEXVSSVG 
PPPPPVSEIY 
ANDGSEGVAE 
DEGAFQSEET 
AEFKVKVKW 
RWETILGKT 
TGQPVVTDLG 
DPKMLELSIY 
RNLAGXNQKI 
MMT AGKKIEE 
AFQVSSKIDS 
VHRWEYLKQ 
VLKTRKASIS 
XDNA* 



VKAENARNA? 
NLKEPNKATV 
YGXPVYDETA 
NRTYEPPAGF 
RSSGQYLSET 
GAVSEHLPTT 
DSYSGPVITR 
CMGLELPNPK 
KAPHLLVAGT 
EGIPHLLAPV 
AEAAARGEKI 
LIARLAQKAR 
RTILDQMGAE 
FGEPDYVDDX 
GVQRALRIGY 



FRRHAGQGKG 
SAEARISRLI 
DIHIEEPAAP 
EQVQRSRIAE 
EAFGHDSQAV 
DLLLPPLFNP 
YEIEPDVGVR 
RQMIRLSEIF 
TGSGKSVGVN 
VTDMKLAANA 
GNPFSLTPDN 
AAGIHLILAT 
NLLGQGDMLF 
LSGGMSDDLL 
NRAARLIDQM 



QAEAKSPDVS 

PESRTWGKR 

WDAWWEPPEV 

TDHLADDVLN 

CPFENVPSER 

GATQTEEXLL 

GNSVLNLEKX 

NSPEFAESKS 

AMILSMLFKA 

LNWCVNEMEK 

PEPLXK LPFI 

QRPSVDVITG 

LPPGTAYPQR 

GISRSGDGET 

EAEGIVSAPE 



ORP58a and ORF58-1 show 96.6% identity in 1014 aa overlap: 



irf58a.pep MFWIVLIVILLLALAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPDFPELA 
>rf5S-l MFWIVLIVILLLALAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPDFPELA 



irf58a.pep LMLFHAVKTAVYWLFVGVVRFCRNYLAHESEPDRPVPPASANRADVPTASDGYSDSGNGT 



190 200 210 220 230 240 

orf58a.pep EEATRALNSAALRETKKRYIDAFEKNETAVFKVRVSDTPMEGLQIIGLDDPVLQRTYSRM 

or f 58-1 EEATRALNSAALRETKKRYIDAFEKNETAVPKVRVSDTPMEGLQIIGLDDPVLQRTYSHM 
190 200 210 220 230 240 

250 260 270 280 290 300 

orf 58a . pep FDADKEAFSESADYGFEPYFEKQHPSAFSAVKAENARNAPFRRHAGQGKGQAEAKSPDVS 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I 
o r f 5 8 - 1 FDADKEAFSE S AD YG FEPYFEKQHP S AFS AVKAENARNAP FHRHAGQGKGQAEAKS PDVS 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 58a. pep QGQSVSDGTAVRDAXRRVSVNLKEPNKATVSAEARISRLIPESRTWGKRDVEMPSETEN 

orf 58-1 QGQSVSDGTAVRDARRRVSVNLXEPNKA7VSAEARISRLIPESQTVVGKRDVEMPSETEN 
310 320 330 340 350 360 

370 380 390 400 410 420 

orf 58a. pep VFTEXVSSVGYGXPVYDETADIHIEEPAAPDAWWEPPEVPKVPMPAXDI PPPPPVSEIY 

llll: I I I II I I I I I II I I I I I I I I I 

orf 58-1 VFTETVSSVGYGGPVYDETADIHIEEPAAPDAWWEPPEVPKVPMTAIDIQPPPPVSEIY 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 58a. pep NRTYEPPAGFEQVQRSRIAETDHLADDVLNGGWQEETAAIANDGSEGVAERSSGQYLSET 

orf 58-1 NRTYEPPSGFEQVQRSRIAETDHLADDVLNGGWQEETAAIADDGSEGAAERSSGQYLSET 
430 440 450 460 470 480 

490 500 510 520 530 540 

orf 58a. pep EAFGHDSQAVCPFENVPSERPSRFAXDTEADEGAFQSEETGAVSEHLPTTDLLLPPLFNP 
I I II I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I II I I I I I I I I I I I I II I I I ! I I I 
orf 58-1 EAFGHDSQAVCPFENVPSSRPSCRVSDTEADEGAFPSEETGAVSEHLPTTDLLLPPLFNP 

490 500 510 520 530 540 
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550 560 570 580 590 600 

orf58a.pep GATQTEEXLLXNSITIEEKXAEFKVKVKWDSYSGPVITRYEIEPDVGVRGNSVLNLEKX 

orf58-l EATQTEEELLENSITIEEKLAEFKVKVKWDSYSGPVITRYEIEPDVGVRGNSVLNLEKD 
550 560 570 580 590 600 

610 620 630 640 650 660 

orf58a.pep LARSLGVASIRWETILGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDI 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! 
orf58-l LARSLGVASIRVVETIPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDI 

610 620 630 640 650 660 



670 680 690 700 710 720 

orf 58a . pep TGQPWTDLGKAPHLLVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIY 

I I I I I I I I I I I I I I Ill I I I I I I I I I I I I I I I I I I 

orf 58-1 TGQPWTDLGKAPHLLVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIY 
670 680 690 700 710 720 



730 740 750 760 770 780 

EGIPHLLAPVVTDMKLAANALNWCVNEMEKRYRLMSFMGVRNLAGXNQKIAEAAARGEKI 

EGIPHLLAPWTDMKLAANALNWCVNEMEKRYRLMSFMGVRNLAGFNQKIAEAAARGEKI 
730 740 750 760 770 780 



850 860 870 880 890 900 

orf 58a . pep QRPSVDVITGLIKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLPPGTAYPQR 

I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I MINIM 

orf 58-1 QRPSVDVITGLIKANIPTRIAFQVSSKIDSRTILDOMGAENLLGQGDMLFLLPGTAYPQR 
850 860 870 880 890 900 



910 920 930 940 950 960 

orf 58a . pep VHGAFASDEEVHRVVEYLKQFGEPDYVDDXLSGGMSDDLLGISRSGDGETDPMYDEAVSV 

I II I II II II I I I I M M II I I M I I M I MM |::| I I : I I I I I I I I I I I I I I I 
orf 58-1 VHGAFASDEEVHRVVEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDDETDPMYDEAVSV 

910 920 930 940 950 960 

970 980 990 1000 1010 

orf 58a. pep VLKTRKASISGVQRALRIGYNRAARLIDQMEAEGIVSAPEHNGNRTILVPXDNAX 

II I II I M I I I I I I II II II I I I I i I II I II ! I II I II I I I I I I I I II I I II II 
orf 58-1 VLKTRKASISGVQRALRIGYNRAARLIDQMEAEGIVSAPEHNGNRTILVPLDNAX 

970 980 990 1000 1010 



Homology with a predicted ORF from N.gonorrhoeae 

ORF58 shows complete identity over a 9aa overlap with a predicted ORF (ORF58ng) from N. 
gonorrhoeae: 



orf58.pep ALMLFHAVKTAVYWLFVGWRFCRNYLAHESEPDRPVPP 103 

II 

orf58ng S E PDR PVP PAS ANRADVPTAS DGYS DS GNG 30 

The ORF58ng nucleotide sequence <SEQ ID 493> is predicted to encode a protein having partial 
amino acid sequence <SEQ ID 494>: 



1 . . SEPDRPVPPA SANRADVPTA SDGYSDSGNG TEEAETEAAE AAEEEAADTE 

51 DIATAVIDNR RIPFDRSIAE GLMQSESKTS PVRPVFKEIT LEEATRALSS 

101 AALRETKKRY IDAFEKNGTA VPKVRVSDTP MEGLQIIGLD DPVLQRTYSR 

151 MFDADKEAFS ESADYGFEPY FEKQHPSAFS AVKAENARNA PFRRHAGQEK 

201 GQAEAKSPDV SQGQSVSDGT AVRDARRRVS VNLKEPNKAT VSAEARISRL 

251 IPESRTWGK RDVEMPSETE NVFTETVSSV GYGGPVYDEA ADIHIEEPAA 

301 PDAWWEPPE VPEVAVPEID ILPPPPVSEI YNRTYEPPAG FEQAQRSRIA 
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ETDHLAADVL 
VCPFEDVPSE 
PEATQTEEEL 
RGNSVLNLEK 
FNSPEFAESK 
NAMILSMLFK 
ALNWCVNEME 
DPEPLEK LPF 
TQRPSVDVIT 
FLPPGTAYPQ 
PGIGRSGDGE 
MEAEGIVSAP 



NGGWQEETAA 
RPSCRVSDTE 
LENSITIEEK 
DLARSLGVAS 
SKLTLALGQD 
AAPEDVRMIM 
KRYRLM3FMG 
IWWDEFAD 
GLIKANIPTR 
RVHGAFASDE 
TDPMYDEAVS 
EHNGNRTILV 



IADDGSEGAA 
ADEGAFQSEE 
LAEFKVKVKV 
IRWETIPGK 
ITGQPWTDL 
IDPKMLELSI 
VRNLAGFNQK 
LMMT AGKKIE 
IAFQVSSKID 
EVHRWEYLK 
WLKTRKAS I 
PLDNA* 



ERSSGQYLSE 
TGAVSEHLPT 
VDSYSGPVIT 
TCMGLELPNP 
GKAPHLLVAG_ 
YEGITHLLAP 
IAEAAARGEK 
ELIARLAQKA 
SRTILDQMGA 
QFGEPDYVDD 
SGVQRALRIG 



TEAFGHDSQA 
TDLLLPPLFN 
RYEIEPDVGV 
KRQMIRLSEI 
TTGSGKS VGV 
WTDMKLAAN 
IGNPFSLTPD 
RAAGIHLILA 
ENLLGQGDML 
ILSGGGSEEL 
YNRAARLIDQ 



This partial gonococcal sequence contains a predicted transmembrane region and a predicted 
ATP/GTP -binding site motif A (P-loop; double underlined). Furthermore, it has a domain 
homologous to the FTSK cell division protein of E. coli. Alignment of ORF58ng and FtsK 
(accession number p46889) show a 65 % amino acid identity in 459 overlap: 

IEEKLAEFKVKVKWDSYSGPVITRYEIEPDVGVRGNSVLNLEKDLARSLGVASIRWET 52 6 
+E +LA+F++K W+ GPVITR+E+ GV+ + NL +DLARSL ++RWE 
VEARLADFRIKADWNYSPGPVITRFELNLAPGVKAARISNLSRDLARSLSTVAVRWEV 927 

IPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDITGQPWTDLGKAPHL 586 
IPGK +GLELPN KRQ + L E+ ++ +F ++ S LT+ LG+DI G+PW DL K PHL 
IPGKPYVGLELPNKKRQTVYLREVLDNAKFRDNPSPLTWLGKDIAGEPWADLAKMPHL 987 

LVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIYEGITHLLAPWTDMK 64 6 
LVAGTTGSGKSVGVNAMILSML+KA PEDVR IMIDPKMLELS+YEGI HLL WTDMK 
LVAGTTGSGKSVGVNAMILSMLYKAQPEDVRFIMIDPKMLELSVYEGIPHLLTEWTDMK 1047 



— LEKLPFIVVWDEFADLMMTAGKKIEELIARLAQKARAAGIHLILATQRPSVDVITGL 762 

L+K P+IVV+VDEFADLMMT GKK+EELIARLAQKARAAGIHL+LATQRPSVDVITGL 
PVLKKEPYIWLVDEFADLMMTVGKKVEELIARLAQKARAAGIHLVLATQRPSVDVITGL 1167 



ORF58ng: 


4 67 


FtsK: 


868 


ORF58ng: 


527 


FtsK: 


928 


ORF58ng: 


587 


FtsK: 


988 


ORF58ng: 


647 


FtsK: 


1048 


ORF58ng: 


705 


FtsK: 


1108 


ORF58ng: 


763 


FtsK: 


1168 


ORF58ng: 


823 


FtsK: 


1228 


ORF58ng: 




FtsK: 


1287 



Further work on ORF58ng revealed the complete gonococcal DNA sequence to be <SEQ ID 495>: 



1 ATGTTTTGGA TAGTTTTGAT CGTTATtgtg TTGCTTGCGC TTGCCGGCCT 

51 GTTTTTTGTC CGCGCACAAT CCGAACGCGA GTGGATGCGC GAGGTTTCTG 

101 CGTGGCAGGA AAAGAAAGGG GAAAAACAGG CGGAGCTGCC TGAAATCAAA 

151 GACGGTATGC CCGATTTTCC CGAGTTTTCC CTGATGCTTT TCCATGCCGT 

201 CAAAACGGCA GTGTATTGGC TGTTTGTCGG TGTCGTCCGT TTCTGCCGAA 

251 ACTATCTGGC GCACGAATCC GAACCGGACA GGCCCGTTCC GCCTGCTTCT 

301 GCAAACCGTG CGGATGTTCC GACCGCATCC GACGGGTATT CAGACAGTGG 

351 AAACGGGACG GAAGAAGCGG AAACGGAAGC AGCAGAAGCT GCGGAGGAAG 

401 AGGCTGCCgA TACgGAAGAC ATTGCAACTG CCGTAATCGA CAACCGCCGC 

451 ATCCcatTCG ACCGGAGTAT TGCTGAAGGG TTGATGCAGT CTGAAAGCAA 

501 AACTTCGCCC GTCCGTCCGG TTTTTAAG3A AATCACTTTG GAAGAAGCAA 

551 CGCGTGCTTT AAGCAGCGCG GCTTTAAG3G AAACGAAAAA ACGCTATATC 

601 GATGCATTTG AGAAAAACGG AACAGCCGTC CCCAAAGTAC GCGTGTCCGA 

651 TACCCCGATG GAAGGGCTGC AGATTATCGG TTTGGACGAC CCTGTGCTTC 

701 AACGCACGTA TTCCCGTATG TTTGATGCGG ACAAAGAAGC GTTTTCCGAG 

7 51 TCTGCGGATT ACGGATTTGA GCCGTATTTT GAGAAGCAGC ATCCGTCTGC 
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801 CTTTTCTGCA GTCAAAGCCG AAAATGCACG GAATGCGCCG TTCCGCCGTC 

851 ATGCAGGGCA GGAGAAAGGG CAGGCGGAGG CAAAATCCCC GGATGTTTCC 

901 CAAGGGCAGT CCGTTTCAGA CGGCACAGCC GTCCGCGATG CCCGCCGCCG 

951 CGTTTCCGTC AATTTGAAAG AACCGAACAA GGCAACGGTT TCTGCGGAGG 

1001 CGCGGATTTC GCGCCTGATT CCGGAAAGTC GGACGGTTGT CGGGAAACGG 

1051 GATGTCGAAA TGCCGTCTGA AACCGAAAAT GTTTTCACGG AAACCGTTTC 

1101 GTCTGTGGGA TACGGCGGTC CGGTTTATGA TGAAGCTGCC GATATCCATA 

1151 TTGAAGAGCC TGCCGCGCCC GATGCTTGGG TGGTCGAACC ACCCGAAGTG 

1201 CCGGAGGTAG CCGTACCCGA AATCGATATT CTGCCGCCGC CTCCCGTATC 

1251 GGAAATCTAC AACCGTACCT ATGAGCCGCC GGCAGGATTC GAGCAGGCGC 

1301 AACGCAGCCG CATTGCCGAA ACCGACCATC TTGCCGCTGA TGTTTTGAAT 

1351 GGAGGTTGGC AGGAGGAAAC CGCCGCTATT GCAGATGACG GCAGTGAGGG 

1401 TGCGGCAGAG CGGTCAAGCG GGCAATATCT GTCGGAAACC GAAGCGTTCG 

1451 GGCATGACAG TCAGGCGGTT TGTCCGTTTG AAGATGTGCC GTCTGAACGC 

1501 CCGTCCTGCC GGGTATCGGA TACGGAAGCG GATGAAGGGG CGTTCCAATC 

1551 GGAAGAGACC GGTGCGGTAT CCGAACACCT GCCGACAACC GACCTGCTTC 

1601 TGCCTCCGCT GTTCAATCCC GAGGCGACGC AAACCGAAGA AGAACTGTTG 

1651 GAAAACAGCA TCACCATCGA AGAAAAATTG GCGGAGTTCA AAGTCAAGGT 

1701 CAAGGTTGTC GATTCTTATT CCGGCCCCGT GATTACGCGT TATGAAATCG 

17 51 AACCCGATGT CGGCGTGCGC GGCAATTCCG TTCTGAATTT GGAAAAAGAC 

1801 TTGGCGCGTT CGCTCGGCGT GGCTTCCATC CGCGTTGTCG AAACCATCCC 

1851 CGGCAAAACC TGCATGGGTT TGGAACTTCC GAACCCGAAA CGCCAAATGA 

1901 TACGCCTGAG CGAAATTTTC AATTCGCCCG AGTTTGCCGA ATCCAAATCC 

1951 AAGCTGACGC TCGCGCTCGG TCAGGACATT ACCGGACAGC CCGTCGTAAC 

2001 CGACTTGGGC AAAGCACCGC ATTTGCTGGT TGCCGGCACG ACCGGTTCGG 

2051 GCAAATCGGT GGGTGTCAAC GCGATGATTC TGTCTATGCT TTTCAAAGCC 

2101 GCGCCGGAAG ACGTGCGTAT GATTATGATC GATCCGAAAA TGCTGGAATT 

2151 GAG CAT TT AC GAAGGCATCA CGCACCTGCT CGCCCCTGTC GTTACCGATA 

2201 TGAAGCTGGC GGCAAACGCG CTGAACTGGT GTGTTAACGA AATGGAAAAA 

2251 CGCTACCGCC TGATGAGCTT TATGGGCGTG CGCAATCTTG CGGGCTTCAA 

2301 CCAAAAAATC GCCGAAGCCG CAGCAAGGGG AG AAAAAAT C GGCAATCCGT 

2351 TCAGCCTCAC GCCCGACGAT CCCGAACCTT TGGAAAAACT GCCGTTTATC 

2401 GTGGTCGTGG TCGATGAGTT TGCCGATTTG ATGATGACGG CAGGCAAGAA 

2451 AATCGAAGAA CTGATTGCGC GCCTCGCCCA AAAAGCCCGC GCGGCAGGCA 

2501 TCCACCTTAT CCTTGCCACA CAACGCCCCA GCGTCGATGT CATCACGGGT 

2551 CTGATTAAGG CGAACATCCC GACGCGTATC GCGTTCCAAG TGTCCAGCAA 

2 601 AATCGACAGC CGCACGATTC TCGACCAAAT GGGCGCGGAA AACCTGCTCG 

2651 GTCAGGGCGA TATGCTGTTC CTGCCGCCGG GTACTGCCTA TCCGCAGCGC 

2701 GTTCACGGCG CGTTTGCCTC GGATGAAGAG GTGCACCGCG TGGTCGAATA 

2751 TCTGAAGCAG TTTGGCGAGC CGGACTATGT TGACGATATT TTGAGCGGCG 

2801 GCGGCAGCGA AGAGCTGCCC GGCATCGGGC GCAGCGGCGA CGGCGAAACC 

2851 GATCCGATGT ACGACGAGGC CGTATCCGTT GTCCTGAAAA CGCGCAAAGC 

2901 CAGCATTTCG GGCGTACAGC GCGCCTTGCG CATCGGCTAC AACCGCGCCG 

2 951 CGCGTCTGAT TGACCAAATG GAAGCGGAAG GCATTGTGTC CGCACCGGAA 

3001 CACAACGGCA ACCGTACGAT TCTCGTCCCC TTGGACAATG CTTGA 

This corresponds to the amino acid sequence <SEQ ID 496; ORF58ng-l>: 

1 MFWIVLIVIV LLALAGLFFV RAQS EREWMR EVSAWQEKKG EKQAELPEIK 

51 DGMPDFPEFS LM LFHAVKTA VYW^FVGW R FCRNYLAHES SPDRPVPPAS 

101 ANRADVPTAS DGYSDSGNGT SEAETEAAEA AEEEAADTED IATAVIDNRR 

151 IPFDRSIAEG LMQSESKTSP VRPVFKEITL EEATRALSSA ALRETKKRYI 

201 DAFEKNGTAV PKVRVSDTPM EGLQIIGLDD PVLQRTYSRM FDADKEAFSE 

251 SADYGFEPYF EKQHPSAFSA VKAENARNAP FRRHAGQEKG QAEAKSPDVS 

301 QGQSVSDGTA VRDARRRVSV NLKEPNKATV SAEARISRLI PESRTWGKR 

351 DVEMPSETEN VFTETVSSVG YGGPVYDEAA DIHIEEPAAP DAWWEPPEV 

401 PEVAVPEIDI LPPPPVSEIY NRTYEPPAGF EQAQRSRIAE TDHLAADVLN 

451 GGWQEETAAI ADDGSEGAAE RSSGQYLSET EAFGHDSQAV CPFEDVPSER 

501 PSCRVSDTEA DEGAFQSEET GAVSEHLPTT DLLLPPLFNP EATQTEEELL 

551 ENSITIEEKL AEFKVKVKW DSYSGPVITR YEIEPDVGVR GNSVLNLEKD 

601 LARSLGVASI RWETIPGKT CMGLELPNPK RQMIRLSEIF NSPEFAESKS 

651 KLTLALGQDI TGQPWTDLG KAPHLLVAGT TGSGKSVGVN AMILSMLFKA 

7 01 APEDVRMIMI DPKMLELSIY EGITHLLAPV VT DMKLAANA LNWCVNEMEK 

7 51 RYRLMS FMGV RNLAGFNQKI AEAAARGEKI GNPFSLTPDD PEPLEK LPFI 

801 WWDEFADL MMT AGKKIEE LIARLAQKAR AAGIHLILAT QRPSVDVITG 

851 LIKANIPTRI AFQVSSKIDS RTILDQMGAE NLLGQGDMLF LPPGTAYPQR 

901 VHGAFAS DEE VHRWEYLKQ FGEPDYVDDI LSGGGSEELP GIGRSGDGET 

951 DPMYDEAVSV VLKTRKASIS GVQRALRIGY NRAARLIDQM EAEGIVSAPE 

1001 HNGNRTILVP LDNA* 



ORF58ng-l and ORF58-1 show 97.2% identity in 1014 aa overlap: 
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10 20 30 40 50 60 

orf 58-1. pep MFWIVLIVILLLALAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPDFPELA 

orf58ng-l MFWIVLIVIVLLALAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPDFPEFS 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf 58-1 . peo LMLFHAVKTAVYWLFVGWRFCRNYLAHESEPDRPVPPASANRADVPTASDGYSDSGNGT 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I II I I I I I I I I I I I I I I I I 
orf58ng-l LMLFHAVKTAVYWLFVGWRFCRNYLAHESEPDRPVPPASANRADVPTASDGYSDSGNGT 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 58-1. pep EEAETEEAEAAEEEAAD7EDIATAVIDNRRIPFDRSIAEGLMPSESEISPVRPVFKEITL 
I I I I II I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I III: I I I I I I I I I I I I 
orf58ng-l EEAETEAAEAAEEEAADTEDIATAVIDNRRIPFDRSIAEGLMQSESKTSPVRPVFKEITL 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 58-1. pep EEATRALNSAALRETKKRYIDAFEKNETAVPKVRVSDTPMEGLQIIGLDDPVLQRTYSHM 

orf58ng-l EEATRALSSAALRETKKRYIDAFEKNGTAVPKVRVSDTPMEGLQIIGLDDPVLQRTYSRM 
190 200 210 220 230 240 

250 260 270 280 290 300 

orf 58-1 . pep FDADKEAFSESADYGFEPYFEKQHPSAFSAVKAENARNAPFHRHAGQGKGQAEAKSPDVS 

orf58ng-l FDADKEAFSE SADYGFE P YFEKQHPSAFSAVKAENARNAPFRRHAGQEKGQAEAKS PDVS 
250 260 270 280 290 300 

310 320 330 340 350 360 

orf 58-1. pep QGQSVSDGTAVRDARRRVSVNLKEPNKATVSAEARISRLIPESQTWGKRDVEMPSETEN 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I : I I I I I I I I I I I I I I I I 
orf58ng-l QGQSVSDGTAVRDARRRVSVNLKEPNKATVSAEARISRLIPESRTVVGKRDVEMPSETEN 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 58-1 . pep VFTETVSSVGYGGPVYDETADIHIEEPAAPDAWVVEPPEVPKVPMTAIDIQPPPPVSEIY 

orf58ng-l VFTETVSSVGYGGPVYDEAADIHIEEPAAPDAWWEPPEVPEVAVPEIDILPPPPVSEIY 
370 380 390 400 410 420 

430 440 450 460 470 480 

orf 58-1 . pep NRTYEPPSGFEQVQRSRIAETDHLADDVLNGGWQEETAAIADDGSEGAAERSSGQYLSET 

orf58ng-l NRTYEPPAGFEQAQRSRIAETDHLAADVLNGGWQEETAAIADDGSEGAAERSSGQYLSET 
430 440 450 460 470 480 

490 500 510 520 530 540 

orf 58-1. pep EAFGHDSQAVCPFENVPSERPSCRVSDTEADEGAFPSEETGAVSEHLPTTDLLLPPLFNP 

orf58ng-l EAFGHDSQAVCPFEDVPS2RPSCRVSDTEADEGAFQSEETGAVSEHLPTTDLLLPPLFNP 
490 500 510 520 530 540 

550 560 570 580 590 600 

orf 58-1. pep EATQTEEELLENSITIEEKLAE FKVKVKVVDSYSGPVITRYEIEPDVGVRGNSVLNLEKD 

orf58ng-l EATQTEEELLENSITIEEKLAEFKVKVKWDSYSGPVITRYEIEPDVGVRGNSVLNLEKD 
550 560 570 580 590 600 

610 620 630 640 650 660 

orf 58-1. pep LARSLGVASIRWETIPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDI 

orf58ng-l LARSLGVASIRVVETIPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDI 
610 620 630 640 650 660 

670 680 690 700 710 720 

orf 58-1. pep TGQPWTDLGKAPHLLVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIY 

orf58ng-l TGQPVVTDLGKAPHLLVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPfMLELSIY 
670 680 690 700 710 720 
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orf 58-1 .pep VHGAFASDEEVHRVVEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDDETDPMYDEAVSV 



30 Furthermore, ORF58ng-l shows significant homology to the E.coli protein FtsK: 

sp|P4 6889|FTSK_EC0LI CELL DIVISION PROTEIN FTSK >gi | 1651412 | gnl I PID | dl015290 (Dl 
division protein FtsK [Escherichia coli] >gi | 1651418 t gnl I PID I dl015296 (D90727) Cell 
division protein FtsK [Escherichia coli] >gi 11787117 (AE000191) cell division 
protein FtsK [Escherichia coli] Length = 1329 
35 Score = 576 bits (1469), Expect = e-163 

Identities = 301/459 (65%), Positives = 353/459 (76%), Gaps = 5/459 (1%) 

IEEKLAEFKVKVKVVDSYSGPVITRYEIEPDVGVRGNSVLNLEKDLARSLGVASIRWET 615 
+E +LA+F++K W+ GPVITR+E+ GV+ + NL +DLARSL ++RWE 
VEARLADFRIKADWNYSPGPVITRFELNLAPGVKAARISNLSRDLARSLSTVAVRWEV 927 

IPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDITGQPVVTDLGKAPHL 675 
IPGK +GLELPN KRQ + L E+ ++ +F ++ S LT+ LG+DI G+PW DL K PHL 
IPGKPYVGLELPNKKRQTVYLREVLDNAKFRDNPSPLTWLGKDIAGEPVVADLAKMPHL 987 

LVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIYEGITHLLAPWTDMK 735 
LVAGTTGSGKSVGVNAMILSML+KA PEDVR IMIDPKMLELS+YEGI HLL WTDMK 
LVAGTTGSGKSVGVNAMILSMLYKAQPEDVRFIMIDPKMLELSVYEGIPHLLTEWTDMK 1047 
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Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 59 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 497>: 

1 ATGATTTATC AAAGAAACCT CATCAAAGAA CTCTCTTTTA CCGCCGTCGG 

51 CATTTTCGTC GTCCTCTTGG CGGTATTGGT CTCCACGCAG GCAATCAACC 

101 TGCTCGGCCG TGCCGCCGAC GGGC. .GTGA TCGCCATCGA TGCCGTGTTG 

151 GCATTGGTCG GCTTCTGGGT C 

// 

901 A TTGCCATCGG TTTGTTTTTA ATTTACCAAA ACGGGCTGAC 

951 CCTGCTTTTT GAAGCCGTGG AAGACGGCAA AATCCATTTT TGGCTCGGAC 

1001 TGCTGCCTAT GCACATTATC ATGTTTGTCC TTGCACTCAT CCTGTTGCGC 

1051 GTCCGCAGTA TGCCCAGCCA GCCCTTCTGG CAGGCGGTTG GCAAAAGTCT 

1101 GACATTGAAA GGCGGAAAAT GA 

This corresponds to the amino acid sequence <SEQ ID 498; ORF101>: 

1 MIYQRNLIKE LSFTAVGIFV VLLAVLVSTQ AINLLGRAAD GXVIAIDAVL 

51 ALVGFWV 

// 

301 ...IAIGLFL IYQNGLTLLF EAVEDGKIHF WLGLLPMHII MFVLALILLR 

351 VRSMPSQPFW QAVGKSLTLK GGK* 

Further work revealed the complete nucleotide sequence <SEQ ID 499>: 

1 ATGATTTATC AAAGAAACCT CATCAAAGAA CTCTCTTTTA CCGCCGTCGG 

51 CATTTTCGTC GTCCTCTTGG CGGTATTGGT CTCCACGCAG GCAATCAACC 

101 TGCTCGGCCG TGCCGCCGAC GGGCGTGTCG CCATCGATGC CGTGTTGGCA 

151 TTGGTCGGCT TCTGGGTCAT CGGTATGACG CCGCTTTTGC TGGTGTTGAC 

201 CGCATTTATC AGTACGTTGA CCGTGTTGAC CCGCTACTGG CGCGACAGCG 

251 AAATGTCGGT CTGGCTATCC TGCGGATTGG CATTGAAACA ATGGATACGC 

301 CCGGTGATGC AGTTTGCCGT GCCGTTTGCC GTTTTGGTTG CCGTCATGCA 

351 GCTTTGGGTG ATACCGTGGG CAGAGCTACG CAGCCGCGAA TACGCTGAAA 

401 TCCTGAAGCA GAAGCAC-GAA TTGTCTTTGG TGGAGGCAGG CGAGTTCAAC 

451 AGTTTGGGCA AGCGCAACGG CAGGGTTTAT TTTGTCGAAA CCTTCGATAC 

501 CGAATCCGGC AT CAT G AAAA ACCTGTTCCT GCGCGAACAG GACAAAAACG 

551 GCGGCGACAA CATCATCTTC GCCAAAGAAG GTAACTTCTC GCTGAACGAC 

601 AACAAACGCA CGCTCGAATT GCGCCACGGC TACCGTTACA GCGGCACGCC 

651 CGGACGCGCC GACTACAATC AGGTTTCCTT CCAAAAACTC AACCTGATTA 

701 TCAGCACCAC GCCCAAACTC ATCGACCCCG TTTCCCACCG CCGTACCATT 

751 CCGACCGCCC AACTGATTGG CAGCAGCAAC CCGCAACATC AGGCGGAATT 

801 GATGTGGCGC ATCTCGCTGA CCGTCAGCGT CCTCCTACTC TGCCTGCTTG 

851 CCGTGCCGCT TTCCTATTTC AACCCGCGCA GCGGACATAC CTACAATATC 

901 TTGATTGCCA TCGGTTTGTT TTTAATTTAC CAAAACGGGC TGACCCTGCT 

951 TTTTGAAGCC GTGGAAGACG GCAAAATCCA TTTTTGGCTC GGACTGCTGC 

1001 CTATGCACAT TATCATGTTT GCCGTTGCAC TCATCCTGTT GCGCGTCCGC 

1051 AGTATGCCCA GCCAGCCCTT CTGGCAGGCG GTTGGCAAAA GTCTGACATT 

1101 GAAAGGCGGA AAATGA 

This corresponds to the amino acid sequence <SEQ ID 500; ORF101-1>: 

1 MIYQRNLIKE LSFTAVGIFV VLLAVLVSTQ A INLLGRAAD GRVAIDAVLA 

51 LVGFWVIGMT PLLL VLTAFI STLTVLTRYW RDSEMSVWLS CGLALKQWIR 

101 PVMQ FAVPFA VLVAVMQLWV I PWAELRSRE YAEILKQKQE LSLVEAGEFN 

151 SLGKRNGRVY FVETFDTESG IMKNLFLREQ DKNGGDNIIF AKEGNFSLND 

201 NKRTLELRHG YRYSGTPGRA DYNQVSFQKL NLIISTTPKL IDPVSHRRTI 

251 PTAQLIGSSN PQHQAELMWR ISLTVSVLLL CLLAVPL SYF NPRSGHTYNI 

301 LIAIGLFLIY QNGLTL LF5A VEDGKIHFWL GLLPMHI IMF AVALILL RVR 

351 SMPSQPFWQA VGKSLTLKGG K* 



Computer analysis of this amino acid sequence gave the following results: 
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Homology with a predicted ORF from N. meningitidis (strain A) 

ORF101 shows 91.2% identity over a 57aa overlap and 95.7% identity over a 69aa overlap with 
an ORF (ORFlOla) from strain A of N. meningitidis: 



orf 101 .pep 
orflOla 



MIYQRNLIKELSFTAVGIFVVLLAVLVSTQAINLLGRAADGXVIAIDAVLALVGFWVX 
MIYQRNLIKEL3FTAVGIFWLLAVLVSTQAINLLGXAADXRX-AIDAVLALVGFWVXXM 



orf 101. pep IAIGLFLIYQNGLTLLFEAVEDGKIHFWLGL 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 101a LTVSVLLLCLLAVPLSYFNPRSGHTYNILXAIGLFLIYQNGLTLLFEAVEDGKIHFWLGL 
280 290 300 310 320 330 



orf 101. pep LPMHIIMFVLALILLRVRSMPSQPFWQAVGKSLTLKGGKX 

I I I I I I I I I : I :: I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 101a LPMHIIMFVIAIVLLRVRSMPSQPFWQAVGKSLTLKGGKX 
340 350 360 370 

The complete length ORFlOla nucleotide sequence <SEQ ID 501 > is: 



1 ATGATTTATC AAAGAAACCT 

51 CATTTTCGTC GTCCTCTTGG 

101 TGCTCGGCCN TGCCGCCGAC 

151 TTGGTCGGCT TCTGGGTCNN 

201 CGCATTTATC AGTACGTTGA 

251 AAATGTCGGT CTGGNTATCC 

301 CCGGTGATGC AGTTTGCCGT 

351 GCTTTGGGTG ATACCGTGGG 

401 TCCTGAAGCA GAAGCAGGAA 

451 AGTTTGGGCA AGCGCAACGG 

501 CGAATCCGGC AT CAT GAAAA 

551 GCGGCGACAA CATCATCTTC 

601 AACAAACGCA CGCTCGAATT 

651 CGGACGCGCC GACTACAATC 

701 TCAGCACCAC GCCCAAACTC 

751 CCNACNGCCC AACTGATTGG 

801 GATGTGGCGC ATCTCGCTGA 

851 CCGTGCCGCT TTCCTATTTC 

901 TTGANTGCCA TCGGTTTGTT 

951 TTTTGAAGCC GTGGAAGACG 

1001 CTATGCACAT CATCATGTTC 

1051 AGCATGCCCA GCCAGCCCTT 

1101 GAAAGGCGGA AAATGA 

This encodes a protein having amino acid 



CATCAAAGAA CTCTCTTTTA CCGCCGTCGG 
CGGTATTGGT CTCCACGCAG GCAATCAACC 
NGGCGTNTCG CCATCGATGC CGTGTTGGCA 
NNGNATGACG CCGCTTTTGC TNGTGTTGAC 
CCGTGTTGAC CCGCTACTGG CGNGACAGCG 
TGCGGATTGG CATTGAAACA ATGGATACGC 
GCCGTTTGCC GTTTTGGTTG CCGTCATGCA 
CAGAGCTACG CAGCCGCGAA TACGCTGAAA 
TTGTCTTTGG TGGAGGCAGG CGGGTTCAAC 
CAGGGTTTAT TTTGTCGAAA CCTTCGATAC 
ACCTGTTCCT GCGCGAACAG GACAAAAACG 
NCCAAAGAAA GTAACTTCTC GCTGAACGAC 
GCGCCACGGC TACCGTTACA GCGGCACGCC 
AGGTTTCCTT CCNAAAACTC AACCTGATTA 
ATCGACCCCG TTTCCCACCG CCGTACNATN 
CAGCAGCAAC CCGCAACATC ANGCGGAATT 
CCGTCAGCGT CCTCCTACTC TGCCTGCTTG 
AACCCGCGCA GCGGACATAC CTACAATATC 
TTTAATTTAC CAAAACGGGC TGACCCTGCT 
GCAAAATCCA TTTTTGGCTC GGACTGCTGC 
GTCATCGCAA TCGTACTTCT GCGCGTCCGC 
CTGGCAGGCG GTTGGCAAAA GTCTGACATT 



sequence <SEQ ID 502>: 



1 MIYQRNLIKE LSFTAVGIFV VLLAVLVSTQ A INLLGXAAD XRXAI DAVLA 

51 LVGFWVXXMT PLLL VLTAFI STLTVLTRYW RDSEMSVWXS CGLALKQWIR 

101 PVMQ FAVPFA VLVAVMQLWV I PWAELRSRE YAE I LKQKQE LSLVEAGGFN 

151 SLGKRNGRVY FVETFDTESG IMKNLFLRSQ DKNGGDNIIF XKESNFSLND 

201 NKRTLELRHG YRYSGTPGRA DYNQVSFXKL NLIISTTPKL IDPVSHRRTX 

251 PTAQLIGSSN PQHXAELMWR ISLTVSVLLL CLLAVPL SYF NPRSGHTYNI 

301 LXAIGLFLIY QNGLTL LFEA VEDGKIHFWL GLLPMHI IMF VIAIVLL RVR 

351 SMPSQPFWQA VGKSLTLKGG K* 

ORFlOla and ORF101-1 show 95.4% identity in 371 aa overlap: 



orf 101a . pep MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGXAADXRXAIDAVLALVGFWVXXMT 60 

orf 101-1 MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGRVAIDAVLALVGFWVIGMT 60 

orf 101a. pep PLLLVLTAFI STLTVLTRYWRDSEMSVWXSCGLALKQWIRPVMQFAVPFAVLVAVMQLWV 120 

orf 101-1 PLLLVLTAFI STLTVLTRYWRDSEMSVWLSCGLALKQWIRPVMQFAVPFAVLVAVMQLWV 120 
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orf 101a. pep 

orfl01-l 

orf 101a. pep 

orfl01-l 

orf 101a. pep 

orfl01-l 

orf 101a .pep 

orfl01-l 

orflOla.pep 

orfl01-l 



IPWAELRSREYAEILKQKQELSLVEAGGFNSLGKRNGRVYFVETFDTESGIMKNLFLREQ 180 
IPWAELRSREYAEILKQKQELSLVEAGEFNSLGKRNGRVYFVETFDTESGIMKNLFLREQ 180 
DKNGGDNI IFXKESNFSLNDNKRTLELRHGYRYSGTPGRADYNQVSFXKLNLIISTTPKL 24 0 
DKNGGDNIIFAKEGNFSLNDNKRTLELRHGYRYSGTPGRADYNQVSFQKLNLIISTTPKL 24 0 
I DPVSHRRTXPTAQLIGS SNPQHXAELMWRI SLTVSVLLLCLLAVPLS YFNPRSGHTYN I 300 
IDPVSHRRTIPTAQLIGSSNPQHQAELMWRISLTVSVLLLCLLAVPLSYFNPRSGHTYNI 300 
LXAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFVIAIVLLRVRSMPSQPFWQA 360 
LIAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFAVALILLRVRSMPSQPFWQA 360 
VGKSLTLKGGK 371 
VGKSLTLKGGK 371 



Homology with a predicted ORF from A J .gonorrhoeae 

ORF 101 shows 96.5 % identity in 57aa overlap at the N-terminal domain and 95.1% identity in 
61aa overlap at the C-terminal domain, respectively, with a predicted ORF (ORFlOlng) from N. 
gonorrhoeae: 

orf 101. pep MIYQRNLIKELSFTAVGIFVVLLAVLVSTQAINLLGRAADGXVIAIDAVLALVGFWV 57 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orflOlng MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGRV-AIDAVLALVGFWVIGM 59 

// 

orf 101. pep IAIGLFLIYQNGLTLLFEAVEDGKIHFWLG 333 

orflOlng SLTVSVLLLCLLAVPLS YFNPRSGHTYNILIAIGLFLIYQNGLTLLFEAVEDGKIHFWLG 331 

orf 101. pep LLPMHI IMFVLALILLRVRSMPSQPFWQAVGKSLTLKGGK 373 
orflOlng LLPMHI IMFVIAIVLLRVRSMPSQPFWQAVG 362 

The ORFlOlng nucleotide sequence <SEQ ID 503> is predicted to encode a protein having partial 
amino acid sequence <SEQ ID 504>: 

1 MIYORNLIKE LSFTAVGIFV V LLAVLVSTO AINLLGRAAD GRVAIDAVLA 

51 LVGFWVIGMT PLLL VLTAFI STLTVLTRYW RDSEMSVWLS CGLALKQWIR 

101 PVMQ FAVPFA ILIAVMQLWV I PWAELRSRE YAEILKQKQE LSLVEAGEFN 

151 NLGKRNGRVY FVETFDTESG IMKNLFLREQ DKNGGDNIIF AKEGNFSLKD 

201 NKRTLELRHG YRYSGTPGRA DYNQVSFQKL NLIISTTPKL IDPVSHRRTI 

251 STAQLIGSSN PQHQAELMWR ISLTVSVLLL CLLAVPL SYF NPRSGHTYNI 

301 LIAIGLFLIY QHGLTL LFEA VEDGKIHFWL GLLPMHIIMF VIAIVLL RVR 

351 SMPSQPFWQA VG. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 505>: 



ATGATTTATC 
CATTTTCGTC 
TGCTTGGCCG 
TTAGTCGGCT 
CGCATTCATC 
AAATGTCGGT 
CCCGTCATGC 
GCTTTGGGTG 
TTTTGAAGCA 
AACTTGGGCA 
CGaatccgGC 
gcggcgacaA 



AAAGAAACCT 
GTCCTCTTGG 
CGCAGCTGAC 
TCTGGGTCAT 
AGCACGCTGA 
CTGGCTATCC 
AGTTTGCCGT 
ATACCGTGGG 
GAAGCAGGAA 
AGCGCAACGG 
AT CAT G AAAA 
CATCATCTTC 



CATCAAAGAA 
CGGTGTTGGT 
GGGCGTGTCG 
CGGTATGACC 
CCGTATTGAC 
TGCGGATTGG 
GCCGTTTGCC 
CAGAGCTGCG 
TTGTCTTTGG 
CAgggtttaT 
ACCTGTtcct 
GCcaaaGAag 



CTCTCTTTTA 
GTCCACGCAG 
CCATCGATGC 
CCGCTTTTGC 
CCGCTACTGG 
CGTTGAAACA 
ATCCTGATTG 
CAGCCGCGAA 
TGGAAGCCGG 
TtcgtcgaaA 
GcGCGAACAG 
gtaactTctc 



CCGCCGTCGG 
GCGATCAACC 
CGTGTTGGCC 
TGGTGTTGAC 
CGCGACAGCG 
GTGGATACGC 
CCGTCATGCA 
TATGCCGAAA 
CGAGTTCAAT 
CCTTTGACAC 
GACAAAAACG 
gctgaaggaC 
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1001 
1051 
1101 



AACAAAcgca 
CGGacGCGCc 
TCAGCACCAC 
tcgacCGCCC 
GATGTGGCGC 
CCGTGCCGCT 
TTGATTGCCA 
TTTTGAAGCC 
CTATGCACAT 
AGTATGCCCA 
GAAAGgcgGA 



cgctcgaATT 
gactaCAATC 
GCCCAAacTT 
AAcTGATTGG 
ATCTCGCTGA 
TTCCTATTTC 
TCGGTTTGTT 
GTGGAAGACG 
CATCATGTTC 
GCCAGCCCTT 
AAATGA 



GCGCCACGGC 
AGGTTtcctt 
ATCGaccCCG 
CAGCAGCAAT 
CCGTCAGCGT 
AACCCGCGCA 
TTTAATTTAC 
GCAAAATCCA 
GTCATCGCAA 
CTGGCAGGCG 



TACCGTTACA 
cCAAAAacTc 
TTTCCCACCG 
CCGCAACATC 
CCTCCTGCTC 
GCGGACATAC 
CAAAACGGGC 
TTTTTGGCTC 
TCGTACTTCT 
GTTGGCAAAA 



GCGGcacgcC 
aacctgATta 
CCGCACCATT 
AGGCAGAATT 
TGCCTACTCG 
CTACAATATC 
TGACCCTGCT 
GGACTGCTGC 
GCGCGTCCGC 
GTCTGACATT 



This corresponds to the amino acid sequence <SEQ ID 506; ORF101ng-l>: 



5 1 LVGFWVIGMT PLLLVLTAFI 



1 MIYQRNLIKE LSFTAVGIFV VLLAVLVSTQ A INLLGRAAD 
STLTVLTRYW RDSEMSVWLS 
JCPWAELRSRE YAEILKQKQE 
IMKNLFLREQ DKNGGDNIIF 
DYNQVSFQKL NLIISTTPKL 
ISLTVSVLLL CLLAVPL SYF 
VEDGKIHFWL GLLPMHIIMF 



PVMQ FAVPFA ILIAVMQLWV 
NLGKRNGRVY FVETFDTE3G 
NKRTLELRHG YRYSGTPGRA 
STAQLIGSSN PQHQAELMWR 
LIAIGLFLIY QNGLTL LFEA 
SMPSQPFWQA VGKSLTLKGG 



GRVAIDAVLA 
CGLALKQWIR 
LSLVEAGEFN 
AKEGNFSLKD 
IDPVSHRRTI 
NPRSGHTYNI 
VIAIVLLRVR 



ORF101ng-l and ORF101-1 show 97.6% identity in 371 aa overlap: 



orf 101-1 .pep 
orflOlng-1 



MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGRVAIDAVLALVGFWVIGMT 

I I I I I I I I I I I I I I II I I I I I I I | I I I I I 

MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGRVAIDAVLALVGFWVIGMT 



10 



20 



30 



40 



50 



60 



70 80 90 100 110 120 

PLLLVLTAFI STLTVLTRYWRDSEMSVWLSCGLALKQWIRPVMQFAVPFAVLVAVMQLWV 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I : I : I I I I I I I 
PLLLVLTAFI STLTVLTRYWRDSEMSVWLSCGLALKQWIRPVMQFAVPFAILIAVMQLWV 

70 80 90 100 110 120 



130 



140 



150 160 170 180 

IPWAELRSREYAEILKQKQELSLVEAGEFNSLGKRNGRVYFVETFDTESGJMKNLFLREQ 

I 111111111:11 MINIMI 

IPWAELRSREYAEILKQKQELSLVEAGEFNNLGKRNGRVYFVETFDTESGIMKNLFLREQ 
130 140 150 160 170 180 



190 



200 



210 220 230 240 

DKNGGDNIIFAKEGNFSLNDNKRTLELRHGYRYSGTPGRADYNQVSFQKLNLI ISTTPKL 

M II I II M I : I I I I I I I I I I I I I II I II II I 

DKNGGDNI I FAKEGNFSLKDNKRTLELRHGYRYSGTPGRADYNQVSFQKLNLI ISTTPKL 
190 200 210 220 230 240 

250 260 270 280 290 300 

IDPVSHRRTIPTAQLIGSSNPQHQASLMWRISLTVSVLLLCLLAVPLSYFNPRSGHTYNI 

MMMMMIMI II M II I II I II II II II II I II I 

IDPVSHRRTISTAQLIGSSNPQHQASLMWRISLTVSVLLLCLLAVPLSYFNPRSGHTYNI 
250 260 270 280 290 300 

310 320 330 340 350 360 

LIAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFAVALILLRVRSMPSQPFWQA 

II I II I II I II II II I II I I I I I M I I 1 I I I I I 1 I I = : I 1 I I I I I I 

LIAIGLFLI YQNGLTLLFEAVEDGKIHFWLGLLPMHTIMFVIAIVLLRVRSMPSQPFWQA 

310 320 330 340 350 360 

370 

VGKSLTLKGGKX 

II II II I II M I 

VGKSLTLKGGKX 



Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
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predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 60 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 507>: 

1 . . GGTGGTGGTT TTATCAATGC TTCCTGTGCC ACTTTGACGA CAGCCAAACC 

51 GCAATATCAA GCAGGAGACC TTAGCGCTTT TAAGATAAGG CAAGGCAATG 

101 TTGTAATCGC CGGACACGGT TTGGATGCAC GTGATACCGA TTACACACGT 

151 ATTCTCAGTT ATCATTCCAA AATCGATGCA CCCGTATGGG GACAAGATGT 

201 TCGTGTCGTC GCGGGACAAA ACGATGTGGC CGCAACAGGT GATGCACATT 

251 CGCCTATTCT CAATAATGCT GCTGCCAATA CGTCAAACAA TACAGCCAAC 

301 AACGGCACAC ATATCCCTTT ATTTGCGATT GATACAGGCA AATTAGGAGG 

351 TAT . GTATGC CAACAAAATC ACCTTGATCA GTACGGTCGA GCAAGCAGGC 

4 01 ATTCGTAA 

This corresponds to the amino acid sequence <SEQ ID 508; ORF1 13>: 



1 . . GGGFINASCA TLTTAKPQYQ AGDLSAFKIR QGNVVIAGHG LDARDTDYTR 
51 ILSYHSKIDA PVWGQDVRW AGQNDVAATG DAHSPILNNA AANTSNNTAN 
101 NGTHIPLFAI DTGKLGGXVC QQNHLDQYGR ASRHS* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with with pspA putative secreted protein of A T . meningitidis (accession AF03094P 
ORF and pspA show 44% aa identity in 179aa overlap: 

orfll3 GGGFINASCATLTTAKPQYQAGDLSAFKIRQGNWIAGHGLDARDTDYTRILSYHSKIDA 60 

GGG INA+ TLT+ P G+L+ F + G WI G GLD D DYTRILS ++I+A 
pspa GGGLINAASVTLTSGVPVLNNGNLTGFDVSSGKWIGGKGLDTSDADYTRILSRAAEINA 256 

or f 1 1 3 PVWGQDVRVVAGQNDVAATGDAHSPI LXXXXXXXXXXXXXXGTH I PLFAI DTGKLGGMYA 120 

VWG+DV+VV+G+N + G + P AIDT LGGMYA 

pspa GVWGKDVKVVSGKNKLDFDG S LAKT AS AP S S S DS VT PT VAI DTAT LGGMYA 307 

orf 113 NKITLISTVEQAGIRNQGQWFASAGNVAVNAEGKLVNTGMIAATGENHAVSLHARNVHN 17 9 

+KITLIST A IRN+G+ FA+ G V ++A+GKL N+G I A +++ A+ V N 

pspa DKITLISTDNGAVIRNKGRIFAATGGVTLSADGKLSNSGSIDAA EITISAQTVDN 362 

Homology with a predicted ORF from N. gonorrhoeae 

ORF1 13 shows 86.5% identity in 52aa overlap at the N- terminal part and 94.1% identity in 17aa 
overlap at the C-terminal part with a predicted ORF (ORF1 13ng) from N. gonorrhoeae: 

orfll3 GGGFINASCATLTTAKPQYQAGDLSAFKIR 30 

orfll3ng SHPSQLNGYIEVGGRRAEWIANPAGIAVNGGGFINASRATLTTGQPQYQAGDFSGFKIR 22 4 

orf 113 QGNVVIAGHGLDARDTDYTRILSYHSKIDAPVWGQDVRWAGQNDVAATGDAHSPILNNA 90 

II hi I II I 1: Mill 1:1 II 
orfll3ng QGNAVIAGHGLDARDTDFTRILVCQQNHLDQYGRTSRHS 263 

orfll3 IDTGKLGGXVCQQNHLDQYGRASRHS 135 

orf!13ng DFSGFKIRQGNAVIAGHGLDARDTDFTRILVCQQNHLDQYGRTSRHS 263 



The complete length ORF113ng nucleotide sequence <SEQ ID 509> is predicted to encode a 
protein having amino acid sequence <SEQ ID 510>: 



1 MNKTLYRVIF NRKRGAWAV AETTKREGKS CADSGSGSVY VKSVSFIPTH 
51 SKAFCFSALG FSLCLALGTV NIAFADGIlT DKAAPKTQQA TILQTGNGIP 



WO 99/24578 



-300- 



PCT/IB98/0166S 



101 QVNIQTPTSA GVSVNQYAQF DVGNRGAILN NSRSNTQTQL GGWIQGNPWL 
151 TRGEARWVN QINSSHPSQL NGYIEVGGRR AEWIANPAG IAVNGGGFIN 
201 ASRATLTTGQ PQYQAGDFSG FKIRQGNAVI AGHGLDARDT DFTRILVCQQ 
251 NHLDQYGRTS RHS* 

Based on this analysis, it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 61 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 51 1>: 

1 . . TCAACGGGAC ATAGCGAACA AAATTACACT TTGCCGCGAG AAATCACACG 

51 CAACATTTCA CTGGGTTCAT TTGCCTATGA ATCGCATCGC AAAGCATTAA 

101 GCCATCATGC GCCCAGCCAA GGCACTGAGT TGCCGCAAAG CAACGGTATT 

151 TCGCTACCCT ATACGTCCAA TTCTTTTACC CCATTACCCA GCAGCAGCTT 

2 01 AT AC AT TAT C AATCCTGTCA ATAAAGGCTA TCTTGTTGAA ACCGATCCAC 

2 51 GCTTTGCCAA CTACCGTCAA TGGTTGGGTA GTGACTATAT GCtGGACAGC 

301 CTCAAACTAG ACCCAAACAA TTTACATAAA CGTTTGGGTG ATGGTTATTA 

351 CGAGCAACGT TTAATCAATG AACAAATCGC AGAGCTGACA GGGCATCGTC 

4 01 GTTTAGAcGG TTATCAAAAC GACGAAGAAC AATTTAAAGC CTTAATGGAT 

4 51 AATGGCGCGA CTGCGGCACG TTcGATGAAT CTCAGCGTTG GCATTGCATT 

501 AAGTGCCGAG CAAGTAGCGC AACTGACCAG CGATATTGTT TGGTTGGTAC 

551 AAAAAGAAGT TAAGCTTCCT GATGGCGGCA CACAAACCGT ATTGGTGCCA 

601 CAGGTTTATG TACGCGTTAA AAATGGCGAC ATAGACGGTA AAGGTGCATT 

651 GTTGTCAGGC AGCAATACAC AAATCAATGT TTCAGGCAGC CTGAAAAACT 

7 01 CAGGCACGAT TGCAGGgCGC AATGCGCTTA TTATCAATAC CGATACGCTA 

7 51 GACAATATCG GTGGGCGTAT TCATGCGCAA AAATCAGCGG TTACGGCCAC 

801 ACAAGACATC AATAATATTG GCGGCATGCT TTCTGCCGAA CAGACATTAT 

851 TGCTCAACGC AGGCAACAAC ATCAACAGCC AAAGCACCAC CGCCAGCAGT 

901 CAAAATACAC AAGGCAGCAG CACCTACCTA GACCGAATGG CAGGTATTTA 

951 TATCACAGGC AAAGAAAAAG GTGTTT. . 

This corresponds to the amino acid sequence <SEQ ID 512; ORF1 15>: 

1 . . STGHSEQNYT LPREITRNIS LGSFAYESHR KALSHHAPSQ GTELPQSNGI 

51 SLPYTSNSFT PLPSSSLYII NPVNKGYLVE TDPRFANYRQ WLGSDYMLDS 

101 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

151 NGATAARSMN LSVGIALSAE QVAQLTSDIV WLVQKEVKLP DGGTQTVLVP 

201 QVYVRVKNGD IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALIINTDTL 

251 DNIGGRIHAQ KSAVTATQDI NNIGGMLSAE QTLLLNAGNN INSQSTTASS 

301 QNTQGSSTYL DRMAGIYITG KEKGV. . 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the pspA putative secreted protein of N. meningitidis (accession number AF030941) 
ORF1 15 and pspA protein show 50% aa identity in 325aa overlap: 



Orfll5: 


1 


STGHSEQNYTLPRE ITRN I SLGSFAYESHRKALSHHAPSQGTELPQSNG I SLPYTSNSFT 


60 






STG+S Y E++ +1 +G AY+ + + P + NGI +T 






778 


STGYSRSPYEPAPEVS-SIRMGISAYKGYAPQQASDIPGTWPWAENGIHPTFT 


831 


Orfll5: 


61 


PLPSSSLYIINPVNKGYLVETDPRFANYRQWLGSDYMLDSLKLDPNNLHKRLGDGYYEQR 


120 






LP+SSL+ I P NKGYL+ETDP F +YR+WLGS YML +L+ DPN++HKRLGDGYYEQ+ 






832 


-LPNSSLFAIAPNNKGYLIETDPAFTDYRKWLGSGYMLAALQQDPNHIHKRLGDGYYEQK 


890 


Orfll5: 


121 


LINEQIAELTGHRRLDGYQNDEEQFKALMDNGATAARSMNLSVGIALSAEQVAQLTSDIV 


180 






L+NEQIA+LTG+RRLDGY NDEEQFKALMDNG T A+ + L+ GIALSAEQVA+LTSDIV 




pspA: 


891 


LVNEQIAKLTGYRRLDGYTNDEEQFKALMDNGITIAKELQLTPGIALSAEQVARLTSDIV 


950 


Orfll5: 


181 


WLVQKEVKLPDGGTQTVLVPQVYVRVKNGDIDGKGALLSGSNTQINVSGSLKN-SGTIAG 


239 






WL + V LPDG TQTVL P+VYVR + D++G+GALLSGS I 3G+++N G IAG 




pspA: 


951 


WLENETVTLPDGTTQTVLKPKVYVRARPKDMNGQGALLSGSWDIG-SGAIENRGGLIAG 


1009 
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Orfll5: 240 RNALIINTDTLDNIGGRIHAQKSAVTATQDINNIGGMLSAEQTLLLNAGXXXXXXXXXXX 299 

R ALI+N + N+ G + + A DI N G + AE LLL A 

pspA: 1010 REALILNAQNIKNLQGDLQGKNIFAAAGSDITNTGS-IGAENALLLKASNNIESRSETRS 1068 

5 Olfll5: 300 XXXXXXXXXYLDRMAGIYITGKEKG 324 

+ R+AGIY+TG++ G 
pspA: 1069 NQNEQGSVRNIGRVAGIYLTGRQNG 1093 

Homology with a predicted ORF from N. gonorrhoeae 
10 ORF115 shows 91.9% identity over a 334aa overlap with a predicted ORF (ORF115ng) from 
N. gonorrhoeae: 

orfll5.pep STGHSEQNYTLPREITRNISLGSFAYESHRK 31 

III I I I I I I I : I I I I : I I I II I I I I I I I 
orfll5ng NEQTFGEKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDISLGSFAYESHSK 71 

orfll5.pep ALSHHAPSQGTELPQSN GISLPYTSNSFTPLPSSSLYIINPVNKGYLVET 81 

111:11111 II III : : I 

orfll5ng ALSRHAP5QGTELPQSNRDNIRTAKSNGISLPYTPNSFTPLPGSSLYIINPANKGYLVET 131 

20 orf 115. pep DPRFANYRQWLGSDYMLDSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 141 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orfll5ng DPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 191 

orfllS.pep EEQFKALMDNGATAARSMNLSVGIALSAEQVAQLTSDIVWLVQKEVKLPDGGTQTVLVPQ 201 
25 | | | | | | I I || I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I : I I 

orfll5ng EEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDIVWLVQKEVKLPDGGTQTVLMPQ 251 

orf 115 . pep VYVRVKNGDIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 2 61 
I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
30 orfll5ng VYVRVKNGGIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 311 

orf 1 15 . pep SAVTATQDINNIGGMLSAEQTLLLNAGNNINSQSTTASSQNTQGSSTYLDRMAGIYITGK 321 

II : M I I I I I I I I I I : : : 1111:11111 I I I I I I I 

orfll5ng SAVTATQDINNIGGILSAEQTLLLNAGKNINNQSTAKSSQNAQGSSTYLDRMAGIYITGK 371 



orfll5ng EKGVLAAQAGKDINI IAGQISNQSDQGQTRLQAGRDINLDTVQTGKYQEIHFDADNHTIR 431 

An ORF1 15ng nucleotide sequence <SEQ ID 513> was predicted to encode a protein having amino 
acid sequence <SEQ ID 514>: 

1 MLVQTEKDGL HNEQTFGEKK VFSENGKLHN YWRARRKGHD ETGHREQNYT 

51 LPEEITRDIS LGSFAYESH3 KALSRHAPSQ GTELPQSNRD NIRTAKSNGI 

101 SLPYTPNSFT PLPGSSLYII NPANKGYLVE TDPRFANYRQ WLGSDYMLGS 

151 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

201 NGATAARSMN LSVGIALSAE QAAQLTSDIV WLVQKEVKLP DGGTQTVLMP 

251 QVYVRVKNGG IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALIINTDTL 

301 DNIGGRIHAQ KSAVTATQDI NNIGGILSAE QTLLLNAGNN INNQSTAKSS 

351 QNAQGSSTYL DRMAGIYITG KEKGVLAAQA GKDINIIAGQ ISNQ5DQGQT 

401 RLQAGRDINL DTVQTGKYQE rHFDADNHTI RGSTNEVGSS IQTKGDVTLL 

451 SGNNLNAKAA EVGSAKGTLA VYAKNDITIS SGIHAGQVDD ASKHTGRSGG 

501 GNKLVITDKA Q3HHETAQSS TFEGKQWLQ AGNDANILGS NVISDNGTRI 

551 QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

601 NEHTGSTVGS LKGDTTIVAS KHYEQTGSNV SSPEGNNLIS TQSMDIGAAQ 

651 NQLNSKTTQT YEQKGLTVAF SSPVTDLAQQ AIAVAHKAAK QFDKAKTTAL 

701 MPWRLPMQVG RLFKQAKAPK K* 

Further work revealed the following partial gonococcal DNA sequence <SEQ ID 515>: 

1 TTGCTTGTGC AAACAGAAAA AGACGGTTTG CATAACGAGC AAACCTTTGG 

51 CGAGAAGAAA GTCTTCAGCG AAAATGGTAA GTTGCACAAC TACTGGCGTG 

101 CGCGTCGTAA AGGACATGAT GAAACAGGGC ATCGTGAACA AAATTATACT 

151 TTGCCGGAGG AAAT CACACG CGACATTTCA CTGGGTTCAT TTGCCTATGA 

201 ATCGCATAGC AAAGCATTAA GCCGTCATGC GCCCAGCCAA GGCACTGAGT 

251 TGCCACAAAG TAACCGGGAT AATATCCGTA CTGCGAAAAG CAACGGTATT 
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301 TCGCTACCCT ATACGCCCAA TTCTTTTACC CCATTACCCG GCAGCAGCTT 

351 ATACATTATC AATCCTGCCA ATAAAGGCTA TCTTGTTGAA ACCGATCCAC 

4 01 GCTTTGCCAA CTACCGTCAA TGGTTGGGTA GTGACTATAT GCTGGGCAGC 

4 51 CTCAAACTAG ACCCAAACAA TTTACATAAA CGTTTGGGTG ATGGTTATTA 

501 CGAGCAACGT TTAATCAATG AACAAATCGC AGAGCTGACA GGGCATCGTC 

551 GTTTAGACGG TTATCAAAAC GACGAAGAAC AATTTAAAGC CTTAATGGAT 

601 AATGGCGCGA CTGCGGCACG TTCGATGAAT CTCAGCGTTG GCATTGCATT 

651 AAGTGCCGAG CAAGCAGCGC AACTGACCAG CGATATTGTT TGGTTGGTAC 

7 01 AAAAAGAAGT TAAACTTCCT GATGGCGGCA CACAAACCGT ATTGATGCCA 

7 51 CAGGTTTATG TACGCGTTAA AAATGGCGGC ATAGACGGTA AAGGTGCATT 

8 01 GTTGTCAGGC AGCAATACAC AAATCAATGT TTCAGGCAGC CTGAAAAACT 
851 CAGGCACGAT TGCAGGGCGC AATGCGCTTA TTATCAATAC CGATACGCTA 
901 GACAATATCG GTGGGCGTAT TCATGCGCAA AAATCAGCGG TTACGGCCAC 
951 ACAAGAC AT C AATAATATTG GCGGCATTCT TTCTGCCGAA CAGACATTAT 

1001 TGCTCAATGC GGGTAACAAC ATCAACAACC AAAGCACGGC CAAGAGCAGT 

1051 CAAAATGCAC AAGGTAGCAG CACCTACCTA GACCGAATGG CAGGTATTTA 

1101 TATCACAGGC AAAGAAAAAG GTGTTTTAGC AGCGCAGGCA GGCAAAGACA 

1151 TCAACATCAT TGCCGGTCAA ATCAGCAATC AATCAGATCA AGGGCAAACC 

1201 CGGCTGCAGG CAGGACGCGA CATTAACCTG GATACGGTAC AAACCGGCAA 

1251 ATATCAAGAA ATCCATTTTG ATGCCGATAA CCATACCATC CGAGGTTCAA 

1301 CGAACGA^GT CGGCAGCAGC ATTCAAACAA AAGGCGATGT TACCCtatTG 

1351 TCAGGGAATA ATCTCAATGC CAAAGCTGCC GAAGTCGGCA GCGCAAAAGG 

14 01 CACACTTGCC GTGTATGCTA AAAATGACAT TACTATCAGC TCAGGCATCC 

14 51 ATGCCGGCCA AGTTGATGAT GCGTCCAAAC ATACAGGCAG AAGCGGCGGC 

1501 GGTAATAAAT TAGTCATTAC CGATAAAGCC CAAAGTCATC ACGAAACTGC 

1551 TCAAAGCAGC ACCTTTGAAG GCAAGCAAGT TGTATTGCAG GCAGGAAACG 

1601 ATGCCAACAT CCTTGGCAGT AATGTTATTT CCGATAATGG CACCCGGATT 

1651 CAAGCAGGCA ATCATGTTCG CATTGGTACA ACCCAAACTC AAAGCCAAAG 

1701 CGAAACCTAT CATCAAACCC AAAAATCAGG ATTGATGAGT GCAGGTATCG 

17 51 GCTTCACTAT TGGCAGCAAG ACAAACACAC AAGAAAACCA ATCCCAAAGC 

1801 AACGAACATA CAGGCAGTAC CGTAGGCAGC CTGAAAGGCG ATACCACCAT 

1851 TGTTGCAAGC AAACACTACG AACAAACCGG CAGCAACGTT TCCAGCCCTG 

1901 AGGGCAACAA CCTTATCAGC ACGCAAAGTA TGGATATTGG CGCAGCACAA 

1951 AACCAATTAA ACAGCAAAAC CACCCAAACC TACGAACAAA AAGGCTTAAC 

2001 GGTGGCATTC AGTTCGCCCG TTACCGATTT GGCACAACAA GCGATTGCCG 

2 051 TAGCACACAA AGCAGCAAAC AAGTCGGACA AAGCAAAAAC GACCGCGTTA 

2101 ATGCCATGGC GGCTGCCAAT GCAGGTTGGC AGGCCTATCA AACAGGCAAA 

2151 GGCGCACAAA ACTTAG 

This corresponds to the amino acid sequence <SEQ ID 516; ORF1 15ng-l>: 

1 LLVQTEKDGL HNEQT FGEKK VFSENGKLHN YWRARRKGHD ETGHREQNYT 

51 LPEEITRDIS LGSFAYESHS KALSRHAPSQ GTELPQSNRD NIRTAKSNGI 

101 SLPYTPNSFT PLPGSSLYII NPANKGYLVE TDPRFANYRQ WLGSDYMLGS 

151 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

201 NGATAARSMN LSVGIALSAE QAAQLTSDIV WLVQKEVKLP DGGTQTVLMP 

251 QVYVRVKNGG IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALIINTDTL 

301 DNIGGRIHAQ KSAVTATQDI NNIGGILSAE QTLLLNAGNN INNQSTAKSS 

351 QNAQGSSTYL DRMAGIYITG KEKGVLAAQA GKDINI IAGQ ISNQSDQGQT 

4 01 RLQAGRDINL DTVQTGKYQE IHFDADNHTI RGSTNEVGSS IQTKGDVTLL 

451 SGNNLNAKAA EVGSAKGTLA VYAXNDITIS SGIHAGQVDD ASKHTGRSGG 

501 GNKLVITDKA QSHHETAQSS TFEGKQVVLQ AGNDANILGS NVISDNGTRI 

551 QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

601 NEHTGSTVGS LKGDTTIVAS KHYEQTGSNV SSPEGNNLIS TQSMDIGAAQ 

651 NQLNSKTTQT YEQKGLTVAF SSPVTDLAQQ AIAVAHKAAN KSDKAKTTAL 

7 01 MPWRLPMQVG RPIKQAKAHK T* 

This gonococcal protein (ORF1 15ng-l) shows 91.9% identity with ORF115 over 334aa: 

20 30 40 50 60 70 

orfll5ng-l .p NEQTFGEKECVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDISLGSFAYESHSK 

orfll5 STGHSEQNYTLPREITRNISLGSFAYESHRK 

10 20 30 

80 90 100 110 120 130 

orfll5ng-l.p ALSRHAPSQGTELPQSNRDNIRTAKSNGISLPYTPNSFTPLPGSSLYIINPANKGYLVET 

III: I I I I I I I : : I I I I I I I I 

orfll5 ALSHHAPSQGTELPQSN GISLPYTSNSFTPLPSSSLYIINPVNKGYLVET 

40 50 60 70 80 
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140 150 160 170 180 190 

orfll5ng-l.p DPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 

orfll5 DPRFANYRQWLGSDYMLDSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 
5 90 100 110 120 130 140 

200 210 220 230 240 250 

orfll5ng-l.p EEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDIVWLVQKEVKLPDGGTQTVLMPQ 

10 orfll5 EEQFKALMDNGATAARSMNLSVGIALSAEQVAQLTSDIVWLVQKEVKLPDGGTQTVLVPQ 

150 160 170 180 190 200 

260 270 280 290 300 310 

orfll5ng-l.p VYVRVKNGGIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 
15 I I I I I I I I i I I I I I I 1 I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orfll5 VYVRVKNGDIDGKGALLSC-SNTQINVSG3LKNSGTIAGRNALIINTDTLDNIGGRIHAQK 

210 220 230 240 250 260 

320 330 340 350 360 370 

20 orfll5ng-l.p SAVTATQDINNIGGILSAEQTLLLNAGNNINNQSTAKSSQNAQGSSTYLDRMAGIYITGK 

orfll5 SAVTATQDINNIGGMLSAEQTLLLNAGNNINSQSTTASSQNTQGSSTYLDP^GIYITGK 
270 280 290 300 310 320 

25 380 390 400 410 420 430 

orfll5ng-l.p EKGVLAAQAGKDINIIAGQISNQSDQGQTRLQAGRDINLDTVQTGKYQEIHFDADNHTIR 

orfll5 EKGV 

In addition, it shows homology with a secreted N. meningitidis protein in the database: 

30 gi | 2623258 (AF030941) putative secreted protein [Neisseria meningitidis] Length 

= 2273 

Score = 604 bits (1541), Expect = e-172 

Identities = 325/678 (47%), Positives = 449/678 (65%), Gaps = 22/678 (3%) 



35 




1 


LLVQTEKDGLHNEQTFGEKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDIS 


60 








L+V T + L N++T G K + ++ G LH Y R +KG D TG+ Y E++ I 






Sbjct: 


739 


LIVGTPESALDNDETLGTKTI-TDKGDLHRYHRHHKKGRDSTGYSRSPYEPAPEVS-SIR 


796 


40 


Qt ery : 


61 


LGSFAYESHSKALSRHAPSQGTELPQSNRDNIRTAKSNGISLPYTPNSFTPLPGSSLYII 


120 






+G AY+ + AP Q +++P + + NGI +T LP SSL+ I 






Sbjct: 


797 


MGISAYKGY APQQASDIPGTV— VPWAENGIHPTFT LPNSSLFAI 


840 






121 


NPANKGYLVETDPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELT 


180 


45 






P NKGYL+ETDP F +YR+WLGS YML +L+ DPN++HKRLGDGYYEQ+L+NEQIA+LT 




Sbjct: 


841 


APNNKGYLIETDPAFTDYRKWLGSGYMLAALQQDPNHIHKRLGDGYYEQKLVNEQIAKLT 


900 






181 


GHRRLDGYQNDEEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDIVWLVQKEVKLP 


240 








G+RRLDGY NDEEQFKALMDNG T A+ + L+ GIALSAEQ A+LTSDIVWL + V LP 




50 


Sbjct: 


901 


GYRRLDGYTNDEEQFKALMDNGITIAXELQLTPGIALSAEQVARLTSDIVWLENETVTLP 


960 




241 


DGGTQTVLMPQVYVRVKNGGIDGKGALLSGSNTQINVSGSLKN-SGTIAGRNALIINTDT 


299 








DG TQTVL P+VYVR + ++G+GALLSGS I SG+++N G IAGR ALI+N 






Sbjct: 


961 


DGTTQTVLKPKVYVRARPKDMNGQGALLSGSWDIG-SGAIENRGGLIAGREALILNAQN 


1019 


55 


Query: 


300 


LDNIGGRIHAQKSAVTATQDINNIGGILSAEQTLLLNAGNNINNQSTAKSSQNAQGSSTY 


359 








+ N+ G + + A DI N G I AE LLL A NNI ++S +S+QN QGS 






Sbjct: 


1020 


IKNLQGDLQGKNIFAAAGSDITNTGSI-GAENALLLKASNNIESRSETRSNQNEQGSVRN 


1078 


60 


Query: 


360 


LDRMAGIYITGKEKGVLAAQAGKDINIIAGQISNQSDQGQTRLQAGRDINLDTVQTGKYQ 


419 






+ R+AGIY+TG++ G + AG +1 + A +++NQS+ GQT L AG DI DT + Q 






Sbjct: 


1079 


IGRVAGIYLTGRQNGSVLLDAGNNIVLTASELTNQSEDGQTVLNAGGDIRSDTTGISRNQ 


1138 






420 


EIHFDADNHTIRGSTNEVGSSIQTKGDVTLLSGNNLNAKAAEVGSAKGTLAVYAKNDITI 


479 


65 






FD+DN+ IR NEVGS+I+T+G+++L + ++ +AAEVGS +G L + A DI + 




Sbjct: 


1139 


NTIFDSDNYVIRKEQNEVGSTIRTRGNLSLNAKGDIRIRAAEVGSEQGRLKLAAGRDIKV 


1198 




Query: 


480 


SSGIHAGQVDDASKHTGRSGGGNKLVITDKAQSHHETAQSSTFEGKQWLQAGNDANILG 


539 








+G + +DA K+TGRSGGG K +T ++ + AST +GK+++L +G D + G 




70 


Sbjct: 


1199 


EAGKAHTETEDALKYTGRSGGGIKQKMTRHLKNQNGQAVSGTLDGKEIILVSGRDITVTG 


1258 
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540 


SNVISDNGTRIQAGNHVRIGTTQTQSQSETYHQTQKSGLM-SAGIGFTIGSKTNTQENQS 


598 






SN+I+DN T + A N++ + +T+S+S ++ +KSGLM S GIGFT GSK +TQ N+S 




Sbjct: 


1259 


SNIIADNHTILSAKNNIVLKAAETRSRSAEMNKKEKSGLMGSGGIGFTAGSKKDTQTNRS 


1318 




599 


QSNEHTGSTVGSLKGDTTIVASKHYEQTGSNVSSPEGNNLISTQSMDIGAAQNQLNSKTT 


658 






++ HT S VGSL G+T I A KHY QTGS +SSP+G+ IS+ + I AAQN+ + ++ 




Sbjct: 


1319 


ETVSHTESVVGSLNGNTLISAGKHYTQTGSTISSPQGDVGISSGKISIDAAQNRY3QESK 


1378 




659 


QTYEQKGLTVAFSSPVTD 676 








Q YEQKG+TVA S PV + 




Sbjct: 


1379 


QVYEQKGVTVAISVPWN 1396 





Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 62 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 517>: 

1 . . TCAGGGAATA ACCTCAATGC CAAAGCTGCC GAAGTCAGCA GCGCAAACGG 

51 TACACTCGCT GTGTCTGCCA ATAATGACAT CAACATCAGC GCAGGCATCA 

101 ACACGACCCA TGTTGATGAT GCGTCCAAAC ACACAGGCAG AAGCGGTGGT 

151 GGCAATAAAT TAGTCATTAC CGATAAAGCC CAAAGTCATC ACGAAACCGC 

201 CCAAAGCAGC ACCTTTGAAG GCAAGCAAGT TGTATTGCAG GCAGGAAACG 

251 ATGCCAACAT CCTTGGCAGC AATGTTATTT CCGATAATGG CACCCAGATT 

3 01 CAAGCAGGCA ATCATGTTCG CATTGGTACA ACCCAAACTC AAAGCCAAAG 
351 CGAAACCTAT CATCAAACCC AGAAATCAGG ATTGATGAGT GCAGGTATCG 

4 01 GCTTCACTAT TGGCAGCAAG ACAAACACAC AAGAAAACCA ATCCCAAAGC 
4 51 AACGAACATA CAGGCAGTAC CGTAGGCAGC TTGAAAGGCG ATACCACCAT 
501 TGTTGCAGGC AAACACTACG AACAAATCGG CAGTACCGTT TCCAGCCCGG 
551 AAGGCAACAA TACCATCTAT GCCCAAAGCA TAGACATTCA AGCGGCACAC 
601 AACAAATTAA ACAGTAATAC CACCCAAACC TATGAACAAA AAGG.CTAAC 
651 GGTGGCATTC AGTTCGCCCG TTACCGATTT GGCACAACAA ... 

This corresponds to the amino acid sequence <SEQ ID 518; ORF117>: 

1 . . SGNNLNAKAA EVSSANGTLA VSANNDINIS AGINTTHVDD ASKHTGRSGG 

51 GNKLVITDKA QSHHETAQSS TFEGKQWLQ AGNDANILGS NVISDNGTQI 

101 QAGNHVRIGT TOTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

151 NEHTGSTVGS LKGDTTIVAG KHYEQIGSTV SSPEGNNTIY AQSIDIQAAH 

201 NKLNSNTTQT YEQKXLTVAF SSPVTDLAQQ . . . 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the pspA putative secreted protein of N. meningitidis (accession number AF030941) 
ORF1 17 and pspA protein show 45% aa identity in 224aa overlap: 

Orf 117 : 4 NLNAKAAEVSSANGTLAVSANNDINISAGINTTHVDDASKHTGRSGGGNKLVITDKAQSH 63 

++ +AAEV S G L ++A DI + AG T +DA K+TGRSGGG K +T ++ 
pspA: 1173 DIRIRAAEVGSEQGRLKLAAGRDIKVEAGKAHTETEDALKYTGRSGGGIKQKMTRHLKNQ 1232 

Orf 117 : 64 HETAQSSTFEGKQWLQAGNDANILGSNVISDNGTQIQAGNHVRIGTTQTQSQSETYHQT 123 

+ AST +GK+++L +G D + GSN+I+DN T + A N++ + +T+S+S ++ 
pspA: 1233 NGQAVSGTLDGKEIILVSGRDITVTGSNIIADNHTILSAKNNIVLKAAETRSRSAEMNKK 1292 

Orf 117: 124 QKSGLM-SAGIGFTIGSKTNTQENQSQSNEHTGSTVGSLKGDTTIVAGKHYEQIGSTVSS 182 

+KSGLM S GIGFT GSK +TQ N+S++ HT S VGSL G+T I AGKHY Q GST+SS 
pspA: 12 93 EKSGLMGSGGIGFTAGSKKDTQTNRSETVSHTESVVGSLNGNTLISAGKHYTQTGSTISS 1352 

Orf 117: 183 PEGNNT I YAQS I D I QAAHNKLNSNTTQT YEQKXLT VAFS S PVT D 226 

P+G+ 1+ IIAAN++ +Q YEQK +TVA S PV + 
pspA: 1353 PQGDVG I S SGKI S I DAAQNRY S QE S KQV YEQKGVTVAI S VPWN 1396 
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Homologv with a predicted ORF from N. gonorrhoeae 

ORF117 shows 90% identity over a 230aa overlap with a predicted ORF (ORF117ng) from 
N. gonorrhoeae: 

orf 117. pep SGNNLNAKAAEVSSANGTLAVSANNDINIS 30 

orfll7ng IHFDADNHTIRGSTNEVGSSIQTKGDVTLLSGNNLNAKAAEVGSAKGTLAVYAKNDITIS 480 

orf 117 .pep AGINTTHVDDASKHTGRSGGGNKLVITDKAQSHHETAQSSTFEGKQWLQAGNDANILGS 90 

: I Ml I I I I I I ! I I I I I I I I I 

orfll7ng SGIHAGQVDDASKHTGRSGGGNKLVITDKAQSHHETAQSSTFEGKQVVLQAGNDANILGS 540 

orf 117 .pep NVISDNGTQIQAGNHVRIGTTQTQSQSETYHQTQKSGLMSAGIGFTIGSKTNTQENQSQS 150 

orfll7ng NVISDNGTRIQAGNHVRIGTTQTQSQSETYHQTQKSGLMSAGIGFTIGSKTNTQENQSQS 600 

orf 117 .pep NEHTGSTVGSLKGDTTIVAGKHYEQIGSTVSSPEGNNTIYAQSIDIQAAHNKLNSNTTQT 210 

I I I I I I I I I I I I I I I I I I I : I I I I I i I : I I I I I I I I I : I I : I I I I : I : I I I : I I I I 
orfll7ng NEHTGSTVGSLKGDTTIVASKHYEQTGSNVSSPEGNNLISTQSMDIGAAQNQLNSKTTQT 660 

orf 117. pep YEQKXLTVAFSSPVTDLAQQ 230 
I I I I I I I I I I I I I I I I I I I 

orfll7ng YEQKGLTVAFSSPVTDLAQQAIAVAHKAAKQFDKAKTTALMPWRLPMQVGRLFKQAKAPK 720 

An ORF1 17ng nucleotide sequence <SEQ ID 519> was predicted to encode a protein having amino 
acid sequence <SEQ ID 520>: 

1 . . LLVQTEKDGL HNEQTFGEKK VFSENGKLHN YWRARRKGHD ETGHREQNYT 

51 LPEEITRDIS LGSFAYESHS KALSRHAPSQ GTELPQSNRD NIRTAKSNGI 

101 SLPYTPNSFT PLPGSSLYII NPANKGYLVE TDPRFANYRQ WLGSDYMLGS 

151 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

201 NGATAARSMN LSVGIALSAE QAAQLTSDIV WLVQKEVKLP DGGTQTVLMP 

251 QVYVRVKNGG IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALI INTDTL 

301 DNIGGRIHAQ KSAVTATQDI NNIGGILSAE QTLLLNAGNN INNQSTAKSS 

351 QNAQGSSTYL DRMAGIYITG KEKGVLAAQA GKDINIIAGQ ISNQSDQGQT 

401 RLQAGRDINL DTVQTGKYQE IHFDADNHTI RGSTNEVGSS IQTKGDVTLL 

451 SGNNLNAKAA EVGSAKGTLA VYAKNDITIS SGIHAGQVDD ASKHTGRSGG 

501 GNKLVITDKA QSHHETAQSS TFEGKQVVLQ AGNDANILGS NVISDNGTRI 

551 QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

601 NEHTGSTVGS LKGDTTIVAS KHYEQTGSNV SSPEGNNLIS TQSMDIGAAQ 

651 NQLNSKTTQT YEQKGLTVAF SSPVTDLAQQ AIAVAHKAAK QFDKAKTTAL 

701 MPWRLPMQVG RLFKQAKAPK K* 

Further work revealed the following gonococcal partial DNA sequence <SEQ ID 521>: 

1 TTGCTTGTGC AAACAGAAAA AGACGGTTTG CATAACGAGC AAACCTTTGG 

51 CGAGAAGAAA GTCTTCAGCG AAAATGGTAA GTTGCACAAC TACTGGCGTG 

101 CGCGTCGTAA AGGACATGAT GAAACAGGGC ATCGTGAACA AAATTATACT 

151 TTGCCGGAGG AAATCACACG CGACATTTCA CTGGGTTCAT TTGCCTATGA 

201 ATCGCATAGC AAAGCATTAA GCCGTCATGC GCCCAGCCAA GGCACTGAGT 

251 TGCCACAAAG TAACCGGGAT AATATCCGTA CTGCGAAAAG CAACGGTATT 

301 TCGCTACCCT ATACGCCCAA TTCTTTTACC CCATTACCCG GCAGCAGCTT 

351 ATACATTATC AATCCTGCCA ATAAAGGCTA TCTTGTTGAA ACCGATCCAC 

401 GCTTTGCCAA CTACCGTCAA TGGTTGGGTA GTGACTATAT GCTGGGCAGC 

451 CTCAAACTAG ACCCAAACAA TTTACATAAA CGTTTGGGTG ATGGTTATTA 

501 CGAGCAACGT TTAATCAATG AACAAATCGC AGAGCTGACA GGGCATCGTC 

551 GTTTAGACGG TTATCAAAAC GACGAAGAAC AATTTAAAGC CTTAATGGAT 

601 AATGGCGCGA CTGCGGCAC3 TTCGATGAAT CTCAGCGTTG GCATTGCATT 

651 AAGTGCCGAG CAAGCAGCGC AACTGACCAG CGATATTGTT TGGTTGGTAC 

7 01 AAAAAGAAGT TAAACTTCCT GATGGCGGCA CACAAACCGT ATTGATGCCA 

7 51 CAGGTTTATG TACGCGTTAA AAATGGCGGC ATAGACGGTA AAGGTGCATT 

801 GTTGTCAGGC AGCAATACAC AAATCAATGT TTCAGGCAGC CTGAAAAACT 

851 CAGGCACGAT TGCAGGGCGC AATGCGCTTA TTATCAATAC CGATACGCTA 

901 GACAATATCG GTGGGCGTAT TCATGCGCAA AAATCAGCGG TTACGGCCAC 

951 ACAAGACATC AATAATATTG GCGGCATTCT TTCTGCCGAA CAGACATTAT 

1001 TGCTCAATGC GGGTAACAAC ATCAACAACC AAAGCACGGC CAAGAGCAGT 

1051 CAAAATGCAC AAGGTAGCAG CACCTACCTA GACCGAATGG CAGGTATTTA 
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1101 TATCACAGGC AAAGAAAAAG GTGTTTTAGC AGCGCAGGCA GGCAAAGACA 

1151 TCAACATCAT TGCCGGTCAA ATCAGCAATC AATCAGATCA AGGGCAAACC 

1201 CGGCTGCAGG CAGGACGCGA CATTAACCTG GATACGGTAC AAACCGGCAA 

1251 ATATCAAGAA ATCCATTTTG ATGCCGATAA CCATACCATC CGAGGTTCAA 

1301 CGAACGAAGT CGGCAGCAGC ATTCAAACAA AAGGCGATGT TACCCtatTG 

1351 TCAGGGAATA ATCTCAATGC CAAAGCTGCC GAAGTCGGCA GCGCAAAAGG 

1401 CACACTTGCC GTGTATGCTA AAAATGACAT TACTATCAGC TCAGGCATCC 

14 51 ATGCCGGCCA AGTTGATGAT GCGTCCAAAC ATACAGGCAG AAGCGGCGGC 

1501 GGTAATAAAT TAGTCATTAC CGATAAAGCC CAAAGTCATC ACGAAACTGC 

1551 TCAAAGCAGC ACCTTTGAAG GCAAGCAAGT TGTATTGCAG GCAGGAAACG 

1601 ATGCCAACAT CCTTGGCAGT AATGTTATTT CCGATAATGG CACCCGGATT 

1651 CAAGCAGGCA ATCATGTTCG CATTGGTACA ACCCAAACTC AAAGCCAAAG 

1701 CGAAACCTAT CATCAAACCC AAAAATCAGG ATTGATGAGT GCAGGTATCG 

1751 GCTTCACTAT TGGCAGCAAG ACAAACACAC AAGAAAACCA ATCCCAAAGC 

1801 AACGAACATA CAGGCAGTAC CGTAGGCAGC CTGAAAGGCG ATACCACCAT 

1851 TGTTGCAAGC AAACACTACG AACAAACCGG CAGCAACGTT TCCAGCCCTG 

1901 AGGGCAACAA CCTTATCAGC ACGCAAAGTA TGGATATTGG CGCAGCACAA 

1951 AACCAATTAA AC AG CAAAAC CACCCAAACC TACGAACAAA AAGGCTTAAC 

2001 GGTGGCATTC AGTTCGCCCG TTACCGATTT GGCACAACAA GCGATTGCCG 

2051 TAGCACACAA AGCAGCAAAC AAGTCGGACA AAGCAAAAAC GACCGCGTTA 

2101 ATGCCATGGC GGCTGCCAAT GCAGGTTGGC AGGCCTATCA AACAGGCAAA 

2151 GGCGCACAAA ACTTAG 

This corresponds to the amino acid sequence <SEQ ID 522; ORF117ng-l>: 

1 LLVQTEKDGL HNEQTFGEKK VFSENGKLHN YWRARRKGHD ETGHREQNYT 

51 LPEEITRDIS LGSFAYESHS KALSRHAPSQ GTELPQSNRD NIRTAKSNGI 

101 SLPYTPNS FT PLPGSSLYII NPANKGYLVE TDPRFANYRQ WLGSDYMLGS 

151 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

201 NGATAARSMN LSVGIALSAE QAAQLTSDIV WLVQKEVKLP DGGTQTVLMP 

251 QVYVRVKNGG IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALIINTDTL 

301 DNIGGRIHAQ KSAVTATQDI NNIGGILSAE QTLLLNAGNN INNQSTAKSS 

351 QNAQGSSTYL DRMAGIYITG KEKGVLAAQA GKDINIIAGQ ISNQSDQGQT 

401 RLQAGRDINL DTVQTGKYQE IHFDADNHTI RGSTNEVGSS IQTKGDVTLL 

451 SGNNLNAKAA EVGSAKGTLA VYAKNDITIS SGIHAGQVDD ASKHTGRSGG 

501 GNKLVITDKA QSHHETAQSS TFEGKQWLQ AGNDANILGS NVISDNGTRI 

551 QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

601 NEHTGSTVGS LKGDTTIVAS KHYEQTGSNV SSPEGNNLIS TQSMDIGAAQ 

651 NQLNSKTTQT YEQKGLTVAF SSPVTDLAQQ AIAVAHKAAN KSDKAKTTAL 

701 MPWRLPMQVG RPIKQAKAHK T* 

ORF117ng-l shows the same 90% identity over a 230aa overlap with ORF117. In addition, it 
shows homology with a secreted N. meningitidis protein in the database: 

gi I 2623258 (AF030941) putative secreted protein [Neisseria meningitidis ] Length = 
2273 

Score = 604 bits (1541), Expect = e-172 

Identities = 325/678 (47%), Positives = 449/678 (65%), Gaps = 22/678 (3%) 

Query: 1 LLVQTEKDGLHNEQTFGEKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDIS 60 

L+V T + L N++T G K + ++ G LH Y R +KG D TG+ Y E++ I 
Sbjct: 739 LIVGTPESALDNDETLGTKTI-TDKGDLHRYHRHHKKGRDSTGYSRSPYEPAPEVS-SIR 796 

Query: 61 LGSFAYESHSKALSRHAPSQGTELPQSNRDNIRTAKSNGISLPYTPNSFTPLPGSSLYII 120 

+G AY+ + AP Q +++P + + NGI +T LP SSL+ I 

Sbjct: 797 MGISAYKGY APQQAS DI PGTV VP WAENGI H PT FT LPNSSLFAI 840 

Query: 121 NPANKGYLVETDPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELT 180 

P NKGYL+ETDP F +YR+WLGS YML +L+ DPN++HKRLGDGYYEQ+L+NEQIA+LT 
Sbjct: 841 APNNKGYLIETDPAFTDYRKWLGSGYMLAALQQDPNHIHKRLGDGYYEQKLVNEQIAKLT 900 

Query: 181 GHRRLDGYQNDEEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDI WLVQKEVKLP 240 

G+RRLDGY NDEEQFKALMDNG T A+ + L+ GIALSAEQ A+LTSDIVWL + V LP 
Sbjct: 901 GYRRLDGYTNDEEQFKALMDNGITIAKELQLTPGIALSAEQVARLTSDIVWLENETVTLP 960 

Query: 241 DGGTQTVLMPQVYVRVKNGGIDGKGALLSGSNTQINVSGSLKN-SGTIAGRNALIINTDT 299 

DG TQTVL P+VYVR + ++G+GALLSGS I SG+++N G IAGR ALI+N 
Sbjct: 961 DGTTQTVLKPKVYVRARPKDMNGQGALLSGSWDIG-SGAIENRGGLIAGREALILNAQN 1019 



Query: 



300 



LDNIGGRIHAQKSAVTATQDINNIGGILSAEQTLLLNAGNNINNQSTAKSSQNAQGSSTY 359 
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+ N+ G + + A DI N G I AE LLL A NNI ++S +S+QN QGS 

Sbjct: 102 0 IKNLQGDLQGKNIFAAAGSDITNTGSI— GAENALLLKASNNIESRSETRSNQNEQGSVRN 107 8 

Query: 360 LDRMAGI YITGKEKGVLAAQAGKDINI IAGQI SNQS DQGQTRLQAGRDINLDTVQTGKYQ 419 

+ R+AGIY+TG++ G + AG +1 + A +++NQS+ GQT L AG DI DT + Q 
Sbjct: 107 9 IGRVAGIYLTGRQNGSVLLDAGNNIVLTASELTNQSEDGQTVLNAGGDIRSDTTGISRNQ 1138 

Query: 420 EIHFDADNHTIRGSTNEVGSSIQTKGDVTLLSGNNLNAKAAEVGSAKGTLAVYAKNDITI 479 

FD+DN+ IR NEVGS+I+T+G+++L + ++ +AAEVGS +G L + A DI + 
Sbjct: 1139 NTIFDSDNYVIRKEQNEVGSTIRTRGNLSLNAKGDIRIRAAEVGSEQGRLKLAAGRDIKV 1198 

Query: 480 SSGIHAGQVDDASKHTGRSGGGNKLVITDKAQSHHETAQSSTFEGKQWLQAGNDANILG 539 

+G + +DA K+TGRSGGG K +T ++ + AST +GK+++L +G D + G 
Sbjct: 1199 EAGKAHTETEDALKYTGRSGGGIKQKMTRHLKNQNGQAVSGTLDGKEIILVSGRDITVTG 1258 

Query: 54 0 SNVISDNGTRIQAGNHVRIGTTQTQSQSETYHQTQKSGLM-SAGIGFTIGSKTNTQENQS 598 

SN+I+DN T + A N++ + +T+S+S ++ +KSGLM S GIGFT GSK +TQ N+S 
Sbjct: 1259 SNIIADNHTILSAKNNIVLKAAETRSRSAEMNKKEKSGLMGSGGIGFTAGSKKDTQTNRS 1318 

Query: 59 9 QSNEHTGSTVGSLKGDTTIVASKHYEQTGSNVSSPEGNNLISTQSMDIGAAQNQLNSKTT 658 

++ HT S VGSL G+T I A KHY QTGS +SSP+G+ IS+ + I AAQN+ + ++ 
Sbjct: 1319 ETVSHTESVVGSLNGNTLISAGKHYTQTGSTISSPQGDVGISSGKISIDAAQNRYSQESK 1378 

Query: 65 9 QTYEQKGLTVAFSSPVTD 67 6 

Q YEQKG+TVA S PV + 
Sbjct: 137 9 QVYEQKGVTVAISVPVVN 1396 

Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 63 



The following partial DNA sequence was identified in A '.meningitidis <SEQ ID 523>: 

1 ATGATTTACA TCGTACTGTT TCTAGCTGTC GTCCTCGCCG TTGTCGCCTA 

51 CAACATGTAT CAGGAAAACC AATACCGCAA AAAAGTGCGC GACCAGTTCG 

101 GACACTCCGA CAAAGATGCC CTGCTCAACA GCAwAACCAG CCATGTCCGC 

151 GACGGCAAAC CGTCCGGCGG GTCAGTCATG ATGCCGAAAC CCCAACCGGC 

201 GGTCAAAAAA ACGGCAAAAC CCCAAGACCC CGyCATGCGC AACCTGCAAG 

251 AACAGGATGC CGTCTACATC GCCAAGCAGA AACAGGCAAA AGCCTCCCCG 

301 TTCAAAACCG AAATCGAAAC CGCCTTGGAA GAAAGCGGCA TTATCGGCAA 

351 CTCCGCCCAC ACCGTTTCCG AACCCCAAAC CGGACATTCC GCAACGAAAC 

401 CTGCCGACGC GTCGGCAAAA CCTGCACCCG TTCCGCAAAC ACCTGCAAAA 

4 51 CCGCTGATTA CGCTCAAAGA ACTGTCAAAA GTCGAATTAT CCTGGTTTGA 

501 CGTGCGCATC GACTTCATCT CCTAT . . . 

This corresponds to the amino acid sequence <SEQ ID 524; ORF1 19>: 

1 MIYIVLFLAV VLAWAYNMY QENQYRKKVR DQFGHSDKDA LLNSXTSHVR 

51 DGKPSGGSVM MPKPQPAVKK TAKPQDPXMR NLQEQDAVYI AKQKQAKASP 

101 FKTEIETALE ESGIIGNSAH TVSEPQTGKS ATKPADASAK PAPVPQTPAK 

151 PLITLKELSK VELSWFDVRI DFISY... 

Further work revealed the complete nucleotide sequence <SEQ ID 525>: 

1 ATGATTTACA TCGTACTGTT TCTAGCTGTC GTCCTCGCCG TTGTCGCCTA 

51 CAACATGTAT CAGGAAAACC AATACCGCAA AAAAGTGCGC GACCAGTTCG 

101 GACACTCCGA CAAAGATGCC CTGCTCAACA GCAAAACCAG CCATGTCCGC 

151 GACGGCAAAC CGTCCGGCGG GTCAGTCATG ATGCCGAAAC CCCAACCGGC 

201 GGTCAAAAAA ACGGCAAAAC CCCAAGACCC CGCCATGCGC AACCTGCAAG 

251 AACAGGATGC CGTCTACATC GCCAAGCAGA AACAGGCAAA AGCCTCCCCG 

301 TTCAAAACCG AAATCGAAAC CGCCTTGGAA GAAAGCGGCA TTATCGGCAA 

351 CTCCGCCCAC ACCGTTTCCG AACCCCAAAC CGGACATTCC GCACCGAAAC 

4 01 CTGCCGACGC GCCGGCAAAA CCTGCACCCG TTCCGCAAAC ACCTGCAAAA 

451 CCGCTGATTA CGCTCAAAGA ACTGTCAAAA GTCGAATTAC CCTGGTTTGA 

501 CGTGCGCTTC GACTTCATCT CCTATATCGC GCTGACCGAA GCCAAAGAAC 

551 TGCACGCACT GCCGCGCCTT TCCAACCGCT GCCGCTACCA GATTGTCGGC 

601 TGCACCATGG ACGACCATTT CCAGATTGCC GAACCCATCC CGGGCATCCG 
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651 CTATCAGGCA TTTATCGTGG GTATTCAGGC AGTCAGCCGC AACGGACTTG 

7 01 CCTCGCAGGA AGAACTCTCC GCATTCAACC GCCAGGTGGA CGCATTCGCA 
751 CAAAGCATGG GCGGTCAGAC GCTGCACACC GACCTTGCCG CCTTTATCGA 
801 AGTGGCTTCC GCACTGGACG CATTCTGCGC GCGCGTCGAC CAGACCATCG 

8 51 CCATCCATTT GGTTTCCCCG ACCAGCATCA GCGGCGTAGA ACTGCGTTCC 
901 GCCGTAACGG GCGTGGGTTT CGTTTTGGAA GACGACGGCG CGTTCCACTA 
951 TACCGACACG TCGGGCTCGA CCATGTTCTC CATCTGCTCG CTCAACAACG 

1001 AGCCGTTTAC CAACGCCCTT TTGGACAACC AGTCCTACAA AGGCTTCAGT 

1051 ATGCTGCTCG ACATCCCGCA CTCTCCGGCA GGCGAAAAAA CCTTCGACGA 

1101 TTTGTTTATG GATTTGGCGG TACGCCTGTC CGGCCAGTTG AACCTGAATC 

1151 TGGTCAACGA CAAAATGGAA GAAGTTTCGA CCCAATGGCT CAAAGACGTG 

1201 CGCACTTATG TATTGGCGCG TCAGTCCGAG ATGCTCAAAG TCGGTATCGA 

1251 ACCGGGCGGC AAAACCGCAT TGCGCCTGTT CTCCTAA 

This corresponds to the amino acid sequence <SEQ ID 526; ORF1 19-1>: 

1 MIYIVLFLAV VLAWA YNMY QENQYRKKVR DQFGHSDKDA LLNSKTSHVR 

51 DGKPSGGSVM MPKPQPAVKK TAKPQDPAMR NLQEQDAVYI AKQKQAKASP 

101 FKTEIETALE ESGIIGNSAH TVSEPQTGHS APKPADAPAK PAPVPQTPAK 

151 PLITLKELSK VELPWFDVRF DFISYIALTE AKELHALPRL SNRCRYQIVG 

201 CTMDDHFQIA EPIPGIRYQA FIVGIQAVSR NGLASQEELS AFNRQVDAFA 

251 QSMGGQTLHT DLAAFIEVAS ALDAFCARVD QTIAIHLVSP TSISGVELRS 

301 AVTGVGFVLE DDGAFHYTDT SGSTMFSICS LNNEPFTNAL LDNQSYKGFS 

351 MLLDIPHSPA GEKTFDDLFM DLAVRLSGQL NLNLVNDKME EVSTQWLKDV 

401 RTYVLARQSE MLKVGIEPGG KTALRLFS* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.meningitidis (strain A) 

ORF1 19 shows 93.7% identity over a 175aa overlap with an ORF (ORF1 19a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orfll9.pep MIYIVLFLAVVLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSXTSHVRDGKPSGGSVM 
I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
or f 11 9a MIYIVLFLAAVLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGPVM 

10 20 30 40 50 60 

70 80 90 100 110 120 

orfll9.pep MPKPQPAVKKTAKPQDPXMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 

I I I I I I I 1 I I I I I III I I I I I I I I I I I I I I II I I I I I 

orfll9a MPKPQPAVKKTAKSQDPAMRNLQEQDAVY IAKQKQAKAS PFKTEI ETALEE SGI I GN SAH 

70 80 90 100 110 120 

130 140 150 160 170 

orfll9.pep TVSEPQTGHSATKPADASAKPAPVPQTPAKPLITLKELSKVELSWFDVRIDFISY 

I 1:1 I I I I II I I I I I I I 11111:11111 

or f 11 9a TVPEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 

130 140 150 160 170 180 

orfll9a AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 
190 200 210 220 230 240 

The complete length ORF1 19a nucleotide sequence <SEQ ID 527> is: 

1 ATGATTTACA TCGTACTGTT CCTCGCCGCC GTCCTCGCCG TTGTCGCCTA 

51 CAATATGTAT CAGGAAAACC AATACCGCAA AAAAGTGCGC GACCAGTTCG 

101 GGCACTCCGA CAAAGATGCC CTGCTCAACA GCAAAACCAG CCATGTCCGC 

151 GACGGCAAAC CGTCCGGCGG GCCAGTCATG ATGCCGAAAC CCCAACCGGC 

201 GGTCAAAAAA ACGGCAAAAT CCCAAGACCC CGCCATGCGC AACCTGCAAG 

251 AGCAGGATGC CGTCTACATC GCCAAGCAGA AACAGGCAAA AGCCTCCCCG 

301 TTCAAAACCG AAATCGAAAC CGCCTTGGAA GAAAGCGGCA TTATCGGCAA 

351 CTCCGCCCAC ACCGTTCCCG AACCCCAAAC CGGACATTCC GCACCAAAAC 

401 CTGCCGACGC GCCGGCAAAA CCTGTTCCCG TTCCGCAAAC GCCGGCAAAA 

451 CCGCTGATTA CGCTCAAAGA GCTGTCGAAG GTCGAGCTGC CCTGGTTTGA 

501 CGTGCGCTTC GACTTCATCT CTTATATCGC GCTGACCGAA GCCAAAGAAC 

551 TGCACGCACT GCCGCGCCTT TCCAACCGCT GCCGCTACCA GATTGTCGGC 

601 TGCACCATGG ACGACCATTT CCAGATTGCC GAACCCATCC CGGGCATCCG 
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651 CTATCAGGCA TTTATCGTGG GTATTCAGGC AGTCAGCCGC AACGGACTTG 

7 01 CCTCGCAGGA AGAACTCTCC GCATTCAACC GCCAGGTGGA TGCATTCGCA 

7 51 CACAGCATGG GCGGTCAGAC GCTGCACACC GACCTTGCCG CCTTTATCGA 

801 AGTGGCTTCC GCACTGGACG CATTCTGCGC GCGCGTCGAC CAGACTATCG 

851 CCATCCATTT GGTTTCCCCG ACCAGCATCA GCGGCGTAGA ACTGCGTTCC 

901 GCCGTAACGG GCGTGGGTTT CGTTTTGGAA GACGACGGCG CGTTCCACTA 

951 TACCGACACG TCGGGCTCGA CCATGTTCTC CATCTGCTCG CTCAACAACG 

1001 AGCCGTTTAC CAATGCCCTT TTGGACAACC AGTCCTATAA AGGCTTCAGT 

1051 ATGCTGCTCG ACATCCCGCA CTCTCCGGCA GGCGAAAAAA CCTTCGACGA 

1101 TTTGTTTATG GATTTGGCGG TACGCCTGTC CGGCCAGTTG AACCTGAATC 

1151 TGGTCAACGA CAAAATGGAA GAAGTTTCGA CCCAATGGCT CAAAGACGTG 

12 01 CGCACTTATG TATTGGCTCG TCAGTCCGAG ATGCTCAAAG TCGGTATCGA 

1251 ACCGGGCGGC AAAACCGCAT TGCGCCTGTT CTCCTAA 

This encodes a protein having amino acid sequence <SEQ ID 528>: 



MIYIVLFLAA VLAWA YNMY QENQYRKKVR DQFGHSDKDA LLNSKTSHVR 
DGKPSGGPVM MPKPQPAVKK TAKSQDPAMR NLQEQDAVYI AKQKQAKASP 
FKTEIETALE ESGIIGNSAH TVPEPQTGHS APKPADAPAK PVPVPQTPAK 
PLITLKELSK VELPWFDVRF DFISYIALTE AKELHALPRL SNRCRYQIVG 
CTMDDHFQIA EPIPGIRYQA FIVGIQAVSR NGLASQEELS AFNRQVDAFA 
HSMGGQTLHT DLAAFIEVAS ALDAFCARVD QTTAIHLVSP TSISGVELRS 
AVTGVGFVLE DDGAFHYTDT SGSTMFSICS LNNEPFTNAL LDNQSYKGFS 
MLLDI PH3PA GEKTFDDLFM DLAVRLSGQL NLNLVNDKME EVSTQWLKDV 
RTYVLARQSE MLKVGIEPGG KTALRLFS* 



ORF1 19a and ORF1 19-1 show 98.6% identity in 428 aa overlap: 



10 20 30 40 50 60 

orfll9a.pep miyivlflaavlawaynmyqenqyrkkvrdqfghsdkdallnsktshvrdgkpsggpvm 
I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
orfll9-l miyivlflavvlawaynmyqenqyrkkvrdqfghsdkdallnsktshvrdgkpsggsvm 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 119a. pep mpkpqpavkktaksqdpamrnlqeqdavyiakqkqakaspfkteietaleesgiignsah 

I I I I I I I I I I I I I I I I I I I I ! 1 I I I I I I I I I I I I I I I I I I I I I ! II I ! I I I I I I 

orf 119-1 mpkpopavkktakpqdpamrnlqeqdavyiakqkqakaspfkteietaleesgiignsah 

70 80 90 100 110 120 



130 140 150 160 170 180 

or f 1 1 9a . pep tvpepqtghsapkpadapakpvpvpqtpakplitlkelskvelpwfdvrfdfisyialte 

II I I I I I I I I I I I I II I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I II 

orf 119-1 tvsepqtghsapkpadapakpapvpqtpakplitlkelskvelpwfdvrfdfisyialte 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 11 9a. pep akelhalprlsnrcryqivgctmddhfqiaepipgiryqafivgiqavsrnglasqeels 

I I I I I I I I I 1 I I II I I : I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 11 9-1 akelhalprlsnrcryqivgctmddhfqiaepipgiryqafivgiqavsrnglasqeels 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 119a . pep AFNRQVDAFAHSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 

11111:1111 I I I I I I I I I I I I I I I I I I I I I I 

orf 119-1 AFNRQVDAFAQSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 

250 260 270 280 290 300 



310 320 330 340 350 360 

orf 119a . pep AVTGVGFVLE DDGAFHYTDTSGSTMFSICSLNNEPFTNALLDNQSYKGFSMLLDIPHS PA 

I I I I I I I I I : I I I I I M I I I I I I I I I I I I I I I I I I I I I II I II 

orf 119-1 AVTGVGFVLEDDGAFHYTDTSGSTMFSICSLNNEPFTNALLDNQSYKGFSMLLDIPHSPA 

310 320 330 340 350 360 



370 380 390 400 410 420 

orf 119a. pep GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 
I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I 
orf 119-1 GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 

370 380 390 400 410 420 



429 
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orf 119a. pep KTALRLFSX 
orfll9-l KTALRLFSX 

Homology with a predicted ORF from N.sonorrhoeae 

ORF119 shows 93.1% identity over a 175aa overlap with a predicted ORF (ORF119ng) from 
N. gonorrhoeae: 



orf 119. pep 


MIYIVLFLAVVLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSXTSHVRDGKPSGGSVM 
1 1 1 I 1 1 1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 


60 


orf 119ng 


MIYIVLFLAAVLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGPVM 


60 


orf 119 .pep 


MPKPQPAVKKTAKPQDPXMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 
1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
MPKPQPAVKKPAKPQDSAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEEIGIIGNSAH 


120 


orfll9ng 


120 


orf 119 .pep 


TVSEPQTGHSATKPADASAKPAPVPQTPAKPLITLKELSKVELSWFDVRIDFISY 

llll Ill 11:11111 1 11:11111 

TVSEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 


17 5 


orfll9ng 


180 



The complete length ORF1 19ng nucleotide sequence <SEQ ID 529> is: 

1 ATGATTTACA TCGTACTGTT CCTCGCCGCC GTCCTCGCCG TTGTCGCCTA 

51 CAATATGTAT CAGGAAAACC AATACCGCAA AAAAGTGCGC GACCAGTTCG 

101 GACACTCCGA CAAAGATGCC CTGCTCAACA GCAAAACCAG CCATGTCCGC 

151 GACGGCAAAC CGTCCGGCGG GCCAGTCATG ATGCCGAAAC CCCAACCGGC 

201 GGTCAAAAAA CCGGCCAAAC CCCAAGACTC CGCCATGCGC AACCTGCAAG 

251 AACAGGATGC CGTCTACATC GCCAAGCAGA AACAGGCAAA AGCCTCCCCG 

301 TTCAAAACCG AAATCGAAAC CGCCTTGGAA GAAATCGGCA TTATCGGCAA 

351 CTCCGCCCAC ACCGTTTCCG AACCCCAAAC CGGACATTCC GCACCGAAAC 

401 CTGCCGACGC GCCGGCAAAA CCCGTTCCCG TTCCGCAAAC GCCGGCAAAA 

4 51 CCGCTGATTA CGCTCAAAGA GCTGTCGAAG GTCGAGCTGC CCTGGTTTGA 

501 CGTGCGCTtc gACTTCATCT CCTATATCGC GCTGACCGAA GCCAAAGAAC 

551 TGCACGCACT GCCGCGCCTT tccAACCGCT GCCGCTACCA GATTGTCGGC 

601 TGCACCATGG ACGACCATTT CCAGATTGCC GAACCCATCC CGGGCATCCG 

651 CTATCAGGCA TTTATCGTGG GTATCCAGGC AGTCAGCCGC AACGGACTTG 

7 01 CCTCGCAGGA AGAACTCTCC GCATTCAACC GCCAGGCGGA CGCATTCGCA 

751 CAAAGCATGG GCGGTCAGAC GCTGCACACC GACCTTGCCG CCTTTATCGA 

801 AGTGGCTTCC GCACTGGACG CATTCTGCGC GCGCGTCGAC CAGACCATCG 

851 CCATCCATTT GGTTTCGCCG ACCAGCATCA GCGGCGTAGA ACTGCGTTCC 

901 GCCGTAACGG GCGTGGGTTT CGTTTTGGAA GACGACGGCG CGTTCCACTA 

951 TACCGACACG TCGGGCTCGA CCATGTTCTC CATCTGCTCG CTCAACAACG 

1001 AGCCGTTTAC CAATGCCCTT TTGGACAACC AGTCCTACAA AGGCTTCAGT 

1051 ATGCTGCTCG ACATCCCGCA CTCTCCGGCA GGCGAAAAAA CCTTCGACGA 

1101 TTTGTTTATG GATTTGGCGG TACGCCTGTC CGGTCAGTTG AACCTGAATC 

1151 TGGTCAACGA CAAAATGGAA GAAGTTTC3A CCCAATGGCT CAAAGACGTA 

1201 CGCACTTATG TATTGGCGCG TCAGTCCGAG ATGCTCAAAG TCGGTATCGA 

1251 ACCGGGCGGC AAAACCGCCC TGCGCCTGTT TTCATAA 

This encodes a protein having amino acid sequence <SEQ ID 530>: 

1 MIYIVLFLAA VLAWA YNMY QENQYRKKVR DQFGHSDKDA LLNSKTSHVR 

51 DGKPSGGPVM MPKPQPAVKK PAKPQDSAMR NLQEQDAVYI AKQKQAKAS P 

101 FKTEIETALE EIGIIGNSAH TVSEPQTGHS APKPADAPAK PVPVPQTPAK 

151 PLITLKELSK VELPWFDVRF DFISYIALTE AKELHALPRL SNRCRYQIVG 

201 CTMDDHFQIA EPIPGIRYQA FIVGIQAVSR NGLASQEELS AFNRQADAFA 

251 QSMGGQTLHT DLAAFIEVAS ALDAFCARVD QTIAIHLVSP TSISGVELRS 

301 AVTGVGFVLE DDGAFHYTDT SGSTMFSICS LNNEPFTNAL LDNQSYKGFS 

351 MLLDIPHSPA GEKTFDDLFM DLAVRLSGQL NLNLVNDKME EVSTQWLKDV 

4 01 RTYVLARQSE MLKVGIEPGG KTALRLFS* 

ORF1 19ng and ORF1 19-1 show 98.4% identity over 428 aa overlap: 

10 20 30 40 50 60 

orfll9ng MIYIVLFLAAVLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGPVM 
I I I I I I I I I : I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
orf 11 9-1 MIYIVLFLAWLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGSVM 

10 20 30 40 50 60 
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70 80 90 100 110 120 

orfll9ng MPKPQPAVKKPAKPQDSAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEEIGIIGNSAH 

Orf 119-1 MPKPQPAVKKTAKPQDPAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 
70 80 90 100 110 120 



TVSEPQTGHSAPKPADAPAKPVPVPQT PAKPLITLKELSKVELPWFDVRFDFISYIALTE 
I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
TVSEPQTGHSAPKPADAPAKPAPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 



AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 

I I I I I I I I I I I I I I I I I I I I I I I II 

AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 
190 200 210 220 230 240 



AFNRQADAFAQSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 

AFNRQVDAFAQSMGGQT LHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 
250 260 270 280 290 300 



orf 119ng 
orfl!9-l 



orfll9ng 
orfll9-l 



orfll9ng 
orfll9-l 



AVTGVGFVLEDDGAFHYTDTSGSTMFSICSLNNEPFTNALLDNQSYKGFSMLLDIPHSPA 

I I I I I I I I I I I I I I I I I I I I I I I I I I I 

AVTGVGFVLEDDGAFHYTDTSGSTMFSICSLNNEPFTNALLDNQSYKGFSMLLDIPHSPA 
310 320 330 340 350 360 

370 380 390 400 410 420 

GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 
I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 

370 380 390 400 410 420 

429 

KTALRLFSX 
I I I I I I I I I 
KTALRLFSX 



Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 64 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 53 1> 

1 . . GCGCGGCACG GCACGGAAGA TTTCTTCATG AACAACAGCG ACAC . ATCAG 

51 GCAGATAGTC GAAAGCACCA CCGGTACGAT GAAGCTGCTG ATTTCCTCCA 

101 TCGCCCTGAT TTCATTGGTA GTCGGCGGCA TCGGCGTGAT GAACATCATG 

151 CTGGTGTCCG TTACCGAGCG CACCAAAGAA ATCGGCATAC GGATGGCAAT 

201 CGGCGCGCGG CGCGGCAATA TTTyGCAGCA GTTTTTGATT GAGGCGGTGT 

251 TAATCTGCGT CATCGGCGGT TTGGTCGGCG TGGGTTTGTC CGCCGCCGTC 

301 AGCCTCGTGT TCAATCATTT TGTAACCGAC TTCCCGATGG ACATTTCCGC 

351 CATGTCCGTC ATCGGCGCGG TCGCCTGTTC GACCGGAATC GGCATCGCGT 

401 TCGGCTTTAT GCCTGCCAAT AAAGCAGCCA AACTCAATCC GATAGACGCA 

451 TTGGCACAGG ATTGA 

This corresponds to the amino acid sequence <SEQ ID 532; ORF134>: 



1 . . ARHGTEDFFM NNSDXIRQIV 

51 LVSVTERTKE IGIRMAIGAR 

101 SLVFNHFVTD FPMDISAMSV 

151 LAQD* 



ESTTGTMKLL ISSIALISLV VGGIGVMNIM 
RGNIXQQFLI EAVLICVIGG LVGVGLSAAV 
IGAVAC3TGI GIAFGFMPAN KAAKLNPIDA 
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Further work revealed the complete nucleotide sequence <SEQ ID 533>: 

1 ATGTCGGTGC AAGCAGTATT GGCGCACAAA ATGCGTTCGC TTCTGACGAT 

51 GCTCGGCATC ATCATCGGTA TCGCGTCGGT GGTTTCCGTC GTCGCATTGG 

101 GCAATGGTTC GCAGAAAAAA ATCCTTGAAG ACATCAGTTC GATAGGGACG 

151 AACACCATCA GCATCTTCCC GGGGCGCGGC TTCGGCGACA GGCGCAGCGG 

201 CAGGATTAAA ACCCTGACCA TAGACGACGC AAAAATCATC GCCAAACAAA 

251 GCTACGTTGC TTCCGCCACG CCCATGACTT CGAGCGGCGG CACGCTGACT 

301 TACCGCAACA CCGACCTGAC CGCCTCGCTT TACGGCGTGG GCGAACAATA 

351 TTTCGACGTG CGCGGACTGA AGCTGGAAAC GGGGCGGCTG TTTGACGAAA 

401 ACGATGTGAA AGAAGACGCG CAGGTCGTCG TCATCGACCA AAATGTCAAA 

451 GACAAACTCT TTGCGGACTC GGATCCGTTG GGTAAAACCA TTTTGTTCAG 

501 GAAACGCCCC TTGACCGTCA TCGGCGTGAT GAAAAAAGAC GAAAACGCTT 

551 TCGGCAATTC CGACGTGCTG ATGCTTTGGT CGCCCTATAC GACGGTGATG 

601 CACCAAATCA CAGGCGAGAG CCACACCAAC TCCATCACCG TCAAAATCAA 

651 AGACAATGCC AATACCCAGG TTGCCGAAAA AGGGCTGACC GATCTGCTCA 

701 AAGCGCGGCA CGGCACGGAA GATTTCTTCA TGAACAACAG CGACAGCATC 

751 AGGCAGATAG TCGAAAGCAC CACCGGTACC- ATGAAGCTGC TGATTTCCTC 

801 CATCGCCCTG ATTTCATTGG TAGTCGGCGG CATCGGCGTG ATGAACATCA 

851 TGCTGGTGTC CGTTACCGAG CGCAC CAAAG AAATCGGCAT ACGGATGGCA 

901 ATCGGCGCGC GGCGCGGCAA TATTTTGCAG CAGTTTTTGA TTGAGGCGGT 

951 GTTAATCTGC GTCATCGGCG GTTTGGTCGG CGTGGGTTTG TCCGCCGCCG 

1001 TCAGCCTCGT GTTCAATCAT TTTGTAACCG ACTTCCCGAT GGACATTTCC 

1051 GCCATGTCCG TCATCGGCGC GGTCGCCTGT TCGACCGGAA TCGGCATCGC 

1101 GTTCGGCTTT ATGCCTGCCA ATAAAGCAGC CAAACTCAAT CCGATAGACG 

1151 CATTGGCACA GGATTGA 

This corresponds to the amino acid sequence <SEQ ID 534; ORF134-l>: 



1 MSVQRVLAHK MRSLLTMLGI IIGIA5WSV VALG NGSQKK ILEDISSIGT 

51 NTISIFPGRG FGDRRSGRIK TLTIDDAKII AKQSYVASAT PMTSSGGTLT 

101 YRNTDLTASL YGVGEQYFDV RGLKLETGRL FDENDVKEDA QVWIDQNVK 

151 DKLFADSDPL GKTILFRKRP LTVIGVMKKD ENAFGNSDVL MLWSPYTTVM 

201 HQITGESHTN SITVKIKDNA NTQVAEKGLT DLLKARHGTE DFFMNNSDSI 

251 RQIVESTTGT MKL LISSIAL ISLWGGIGV MNIMLVSVTE RTKEIGIRMA 

301 IGARRGNILQ Q FLIEAVLIC VIGGLVGV GL SAAVSLVFNH FVTDFPMD1S 

351 AMS VIGAVAC STGIGIAFGF MPANKAAKLN PIDALAQD* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with the hypothetical protein o648 of E.coli (accession number AE000189") 
ORF134 and o648 protein show 45% aa identity in 153aa overlap: 

Orfl34: 2 RHGTEDFFMNNSDXIRQIVESTTGTMKXXXXXXXXXXXWGGIGVMNIMLVSVTERTKEI 61 

RHG +DFF N D + + VE TT T++ WGG IGVMNIMLVSVTERT+E I 

0648: 496 RHGKKDFFTWNMDGVLKTVEKTTRTLQLFLTLVAVISLWGGIGVMNIMLVSVTERTREI 555 

Orfl34: 62 GIRMAIGARRGNIXQQFLIEAXXXXXXXXXXXXXXXXXXXXXFNHFVTDFPMDISAMSVI 121 

GIRMA+GAR ++ QQFLIEA F+ + + S ++++ 

o648 : 556 GIRMAVGARASDVLQQFLIEAVLVCLVGGALGITLSLLIAFTLQLFLPGWEIGFSPLALL 615 

Orfl34: 122 GAVACSTGIGIAFGFMPANKAAKLNPIDALAQD 154 

A CST GI FG++PA AA+L+ P+ DALA++ 
o648: 616 LAFLCSTVTGILFGWLPARNAARLDPVDALARE 648 



Homology with a predicted QRF from N. meningitidis (strain A) 

ORF134 shows 98.7% identity over a 154aa overlap with an ORF (ORF134a) from strain A of N. 
meningitidis: 



10 20 30 

orf 134 .pep ARHGTEDFFMNNSDXIRQIVESTTGTMKLL 

orfl34a GESHTNSITVKIKDNANTQVAEKGLTDLLKARHGTEDFFMNNSDSIRQIVESTTGTMKLL 
210 220 230 240 250 260 

40 50 60 70 80 90 
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orfl34.pep ISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMAIGARRGNIXQQFLIEAVLICVIGG 

orf 134a ISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMAIGARRGNILQQFLIEAVLICVIGG 
270 280 290 300 310 320 



100 110 120 130 140 150 

orf 134 .pep LVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 

orf 134a LVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 
330 340 350 360 370 380 



orf 134 .pep 
orfl34a 



LAQDX 
LAQDX 



The complete length ORF 134a nucleotide sequence <SEQ ID 535> is: 



951 
1001 
1051 
1101 
1151 



ATGTCGGTGC 
GCTCGGCATC 
GCAACGGTTC 
AACACCATCA 
CAGGATTAAA 
GCTACGTTGC 
TACCGCAATA 
TTTCGACGTG 
ACGATGTGAA 
GACAAACTCT 
GAAACGCCCC 
TCGGCAATTC 
CACCAAATCA 
AGACAATGCC 
AA3CGCGGCA 
AGGCAGATAG 
CATCGCCCTG 
TGCTGGTGTC 
ATCGGCGCGC 
GTTAATCTGC 
TCAGCCTCGT 
GCCATGTCCG 
GTTCGGCTTT 
CATTGGCGCA 



AAGCAGTATT 
ATCATCGGTA 
GCAGAAAAAA 
GCATCTTCCC 
ACCCTGACCA 
TTCCGCCACG 
CCGACCTGAC 
CGCGGGCTGA 
AGAAGACGCG 
TTGCGGACTC 
TTGACCGTCA 
CGACGTGCTG 
CAGGCGAGAG 
AATACCCAGG 
CGGCACGGAA 
TCGAAAGCAC 
ATTTCATTGG 
CGTTACCGAG 
GGCGCGGCAA 
GTCATCGGCG 
GTTCAATCAT 
TCATCGGCGC 
ATGCCTGCCA 
GGATTGA 



GGCGCACAAA 
TCGCTTCGGT 
ATCCTTGAAG 
AGGGCGCGGC 
TAGACGACGC 
CCCATGACTT 
CGCTTCTTTG 
AGCTGGAAAC 
CAGGTCGTCG 
GGATCCGTTG 
TCGGCGTGAT 
ATGCTTTGGT 
CCACACCAAC 
TTGCCGAAAA 
GATTTCTTCA 
CACCGGTACG 
TAGTCGGCGG 
CGCACCAAAG 
TATTTTGCAG 
GTTTGGTCGG 
TTTGTAACCG 
GGTCGCCTGT 
ATAAAGCAGC 



ATGCGTTCGC 
TGTCTCCGTC 
ACATCAGTTC 
TTCGGCGACA 
AAAAATCATC 
CGAGCGGCGG 
TACGGTGTGG 
GGGGCGGCTG 
TCATCGACCA 
GGTAAAACCA 
GAAAAAAGAC 
CGCCCTATAC 
TCCATCACCG 
AGGGCTGACC 
TGAACAACAC- 
ATGAAGCTGC 
CATCGGCGTG 
AAATCGGCAT 
CAGTTTTTGA 
CGTGGGTTTG 
ACTTCCCGAT 
TCGACCGGAA 
CAAACTCAAT 



TTCTGACGAT 
GTCGCATTGG 
GATAGGGACG 
GGCGCAGCGG 
GCCAAACAAA 
CACGCTGACT 
GCGAACAATA 
TTTGACGAAA 
AAATGTCAAA 
TTTTGTTCAG 
GAAAACGCTT 
GACGGTGATG 
TCAAAATCAA 
GATCTGCTCA 
CGACAGCATC 
TGATTTCCTC 
ATGAACATCA 
ACGGATGGCA 
TTGAGGCGGT 
TCCGCCGCCG 
GGACATTTCC 
TCGGCATCGC 
CCGATAGATG 



This encodes a protein having amino acid sequence <SEQ ID 536>: 

1 MSVQAVLAHK MRSLLTMLGI IIGIASWSV VALG NGSQKK ILEDISSIGT 

51 NTISIFEGRG FGDRRSGRIK TLTIDDAKII AKQSYVASAT PMTSSGGTLT 

101 YRNTDLTASL YGVGEQYFDV RGLKLETGRL FDENDVKEDA QVVVIDQNVK 

151 DKLFADSDPL GKTILFRKRP LTVIGVKKKD ENAFGNSDVL MLWSPYTTVM 

201 HQITGESHTN SITVKIKDNA NTQVAEKGLT DLLKARHGTE DFFMNNSDSI 

251 RQIVESTTGT MKL LISSIAL ISLWGGIGV MNIMLVSVTE RTKEIGIRMA 

301 IGARRGNILQ Q FLIEAVLIC VIGGLVGV GL SAAVSLVFNH FVTDFPMDIS 

351 AMS VIGAVAC STGIGIAFGF MPANKAAKLN PIDALAQD* 

ORF134a and ORF134-1 show 100.0% identity in 388 aa overlap: 



orf 134a. pep MSVQAVLAHKMRSLLTMLGIIIGIASWSWALGNGSQKKILEDISSIGTNTISIFPGRG 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I ! I I I 

orf 134-1 MSVQAVLAHKMRSLLTHLGIIIGIASWSWALGNGSQKKILEDISSIGTNTISIFPGRG 

orf 134a . pep FGDRRSGRIKTLTIDDAKIIAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 

orf 134-1 FGDRRSGRIKTLTIDDAKIIAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 

orf 134a . pep RGLKLETGRLFDENDVKEDAQWVIDQNVKDKLFADSDPLGKTILFRKRPLTVIGVMKKD 

orf 134-1 RGLKLETGRLFDENDVKEDAQVWIDQNVKDKLFADSDPLGKTILFRKRPLTVIGVMKKD 

orf 134a . pep ENAFGNSDVLMLWSPYTTVMHQITGESHTNSITVKIKDNANTQVAEKGLTDLLKARHGTE 

or f 1 3 4 - 1 ENAFGNS DVLMLWS PYTTVMHQITGESHTNS I TVKIKDNANTQVAEKGLT DLLKARHGTE 
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orf 134a. pep DFFMNNSDSIRQIVESTTGTMKLLISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMA 
orf 134-1 DFFMNNSDSIRQIVESTTGTMKLLISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMA 



orf 134 a. pep IGARRGNILQQFLIEAVLICVIGGLVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVAC 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 134-1 IGARRGNILQQFLIEAVLICVIGGLVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVAC 

orf 134a. pep STGIGIAFGFMPANKAAKLNPIDALAQDX 

orf 134-1 STGIGIAFGFMPANKAAKLNPIDALAQDX 



Homology with a predicted ORF from N. gonorrhoeae 

ORF134 shows 96.8% identity over a 154aa overlap with a predicted ORF (ORF134.ng) from N. 
gonorrhoeae: 

orf 134. pep ARHGTEDFFMNNSDXIRQIVESTTGTMKLL 30 

orfl34ng GESHTNSITVKIKDNANTRVAEKGLAELLKARHGTEDFFMNNSDSIRQMVESTTGTMKLL 264 

orf 134 .pep ISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMAIGARRGNIXQQFLIEAVLICVIGG 90 

orfl34ng I5SIALISLWGGIGVMNIMLVSVTERTKEIGIRMAIGARRGNILQQFLIEAVLICIIGG 324 

orf 134 .pep LVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 150 

I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I M II I I I II I I I I I I I I I 1 I I I I I I I I I I 
orfl34ng LVGVGLSAAVSLVFNHFVTDFPMDISAASVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 384 

orf 134. pep LAQD 154 
I I I I 

orfl34ng LAQD 388 

The complete length ORF134ng nucleotide sequence <SEQ ID 537> is: 

1 ATGTCGGTGC AAGCAGTATT GGCGCACAAA ATGCGTTCGC TTCTGACCAT 

51 GCTCGGCATC ATCATCGGTA TCGCTTCGGT TGTCTCCGTC GTCGCGCTGG 

101 GCAACGGTTC GCAGAAAAAA ATCCTCGAAG ACATCAGTTC GATGGGGACG 

151 AACACCATCA GCATCTTCCC CGGGCGCGGC TTCGGCGACA GGCGCAGCGG 

201 CAAAATCAAA ACCCTGACCA TAGACGACGC AAAAATCATC GCCAAACAAA 

2 51 GCTACGTTGC CTCCGCCACG CCCATGACTT CGAGCGGCGG CACGCTGACC 

301 TACCGCAATA CCGACCTGAC CGCTTCTTTG TACGGTGTGG GCGAACAATA 

351 TTTCGACGTG CGCGGGCTGA AGCTGGAAAC GGGGCGGCTG TTTGATGAGA 

4 01 ACGATGTGAA AGAAGACGCG CAAGTCGTCG TCATCGACCA AAATGTCAAA 

451 GACAAACTCT TTGCGGACTC GGATCCGTTG GGTAAAACCA TTTTGTTCAG 

501 GAAACGCCCC TTGACCGTCA TCGGCGTGAT GAAAAAAGAC GAAAACGCTT 

551 TCGGCAATTC CGACGTGCTG ATGCTTTGGT CGCCCTATAC GACGGTGATG 

601 CACCAAATCA CAGGCGAGAG CCACACCAAC TCCATCACCG TCAAAATCAA 

651 AGACAATGCC AATACCCGGG TTGCCGAAAA AGGGCTGGCC GAGCTGCTCA 

701 AAGCACGGCA CGGCACGGAA GACTTCTTTA TGAACAACAG CGACAGCATC 

751 AGGCAGATGG TCGAAAGCAC CACCGGTACG ATGAAGCTGC TGATTTCCTC 

801 CATCGCCCTG ATTTCATTGG TAGTCGGCGG CATCGGTGTG ATGAACATTA 

851 TGCTGGTGTC CGTTACCGAG CGCACCAAAG AAATCGGCAT ACGGATGGCA 

901 ATCGGCGCGC GGCGCGGCAA TATTTTGCAG CAGTTTTTGA TTGAGGCGGT 

951 GTTAATCTGC ATCATCGGAG GCTTGGTCGG CGTAGGTTTG TCCGCCGCCG 

1001 TCAGCCTCGT GTTCAATCAT TTTGTAACCG ATTTCCCGAT GGACATTTCG 

1051 GCGGCATCCG TTATCGGGGC GGTCGCCTGT TCGACCGGAA TCGGCATCGC 

1101 GTTCGGCTTT ATGCCTGCCA ATAAGGCAGC CAAACTCAAT CCGATAGATG 

1151 CATTGGCGCA GGATTGA 

This encodes a protein having amino acid sequence <SEQ ID 538>: 



1 MSVQAVLAHK MRSLLTMLGI IIGIASWSV VALG NGSQKK ILEDISSMGT 

51 NTISIFPGRG FGDRRSGKIK TLTIDDAKII AKQSYVASAT PMTSSGGTLT 

101 YRNTDLTASL YGVGEQYFDV RGLKLETGRL FDENDVKEDA QVWIDQNVK 

151 DKLFADSDPL GKTILFRKRP LTVIGVMKKD ENAFGNSDVL MLWSPYTTVM 

201 HQITGESHTN SITVKIKDNA NTRVAEKGLA ELLKARHGTE DFFMNNSDSI 

251 RQMVESTTGT MKL LISSIAL ISLWGGIGV MNIMLVSVTE RTKEIGIRMA 

301 IGARRGNILQ Q FLIEAVLIC IIGGLVGV GL SAAVSLVFNH FVTDFPMDIS 



WO 99/24578 



-315- 



PCT/IB98/01665 



351 AAS VIGAVAC STGIGIAFGF MPANKAAKLN PIDALAQD* 

ORF134ng and ORF 134-1 show 97.9% identity in 388 aa overlap: 



orf 134ng 
orfl34-l 
orfl34ng 
orfl34-l 
orfl34ng 
orf 134-1 
orfl34ng 
orfl34-l 
orf 134ng 
orfl34-l 
orfl34ng 
orfl34-l 
orfl34ng 
orfl34-l 



MSVQAVLAHKMRSLLTMLGIIIGIASWSVVALGNGSQKKILEDISSMGTNTISIFPGRG 

MSVQAVLAHKMRS LLTMLGI I IGI AS VVS VVALGNGSQKKI LEDISSIGTNTISIFPGRG 

FGDRRSGKIKTLTIDDAKIIAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 
I I I I I I I : I I I I I I I I I I I I I I I I ! I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
FGDRRSGRIKTLTIDDAKIIAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 

RGLKLETGRLFDENDVKEDAQVWIDQNVKDKLFADSDPLGKTILFRKRPLTVIGVMKKD 

RGLKLETGRLFDENDVKEDAQVWIDQNVKDKLFADSDPLGKTILFRKRPLTVIGVMKKD 

ENAFGNSDVLMLWSPYTTVMHQITGESHTNSITVKIKDNANTRVAEKGLAELLKARHGTE 

ENAFGNSDVLMLWSPYTTVMHQITGESHTNSITVKIKDNANTQVAEKGLTDLLKARHGTE 

DFFMNNSDSIRQMVESTTGTMKLLISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMA 
I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
DFFMNNSDSIRQIVESTTGTMKLLISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMA 

IGARRGNILQQFLIEAVLICIIGGLVGVGLSAAVSLVFNHFVTDFPMDISAASVIGAVAC 

IGARRGNILQQFLIEAVLICVIGGLVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVAC 

STGIGIAFGFMPANKAAKLNPIDALAQDX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
STGIGIAFGFMPANKAAKLNPIDALAQDX 



30 ORF1 34ng also shows homology to an E.coli ABC transporter: 



Score = 297 bits (753), Expect = 6e-80 

Identities = 162/389 (41%), Positives = 230/389 (58%), Gaps = 1/389 (0%) 



AQWV+D N + +LF +D +G+ IL 





1 


Sbjct: 


260 


Query: 


61 


Sbjct: 


320 


Query: 


121 


Sbjct: 


380 




180 


Sbjct: 


440 


Query: 


240 


Sbjct: 


500 


Query: 


300 


Sbjct: 


560 


Query: 


360 


Sbjct: 


620 



- FG+S VL +W PY+T+ ++ G+S NSITV++K+ 



AE+ L LL RHG 



A+GAR ++LQQFLIE 



CST GI FG++PA AA+L+P+DALA++ 



WGGIGVMNIMLVSVTERT+EIGIRM 



- + S +++ A 
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Based on this analysis, including the presence of the leader peptide and transmembrane regions in 
the gonococcal protein, it is prediceted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 65 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 539>: 



1 


. . GGGACGGGAG 


CGATGCTGCT 


GCTGTTTTAC 


GCGGTAACGA 


T . CTGCCTTT 


51 


GGCCACTGGC 


GTTACCCTGA 


GTTACACCTC 


GTCGATTTTT 


TTGGCGGTAT 


101 


TTTCCTTCCT 


GATTTTGAAA 


GAACGGATTT 


CCGTTTACAC 


GCAGGCGGTG 


151 


CTGCTCCTTG 


GTTTTGCCGG 


CGTGGTATTG 


CTGCTTAATC 


CCTCGTTCCG 


201 


CAGCGGTCAG 


GAAACGGCGG 


CACTCGCCGG 


GCTGGCGGGC 


GGCGCGATGT 


251 


CCGGCTGGGC 


GTATTTGAAA 


GTGCGCGAAC 


TGTCTTTGGC 


GGGCGAACCC 


301 


GGCTGGCGCG 


TCGTGTTTTA 


CCTTTCCGTG 


ACAGGTGTGG 


CGATGTCGTC 


351 


GGTTTGGGCG 


ACGCTGACCG 


GCTGGCACAC 


CCTGTCCTTT 


CCATCGGCAG 


401 


TTTATCTGTC 


GTGCATCGGC 


GTGTCCGCGC 


TGATTGCCCA 


ACTGTCGATG 


451 


ACGCGCGCCT 


ACAAAGTCGG 


CGACAAATTC 


ACGGTTGCCT 


CGCTTTCCTA 


501 


TATGACCGTC 


GTTTTTTCCG 


CTCTGTCTGC 


CGCATTTTTT 


CTGGGCGAAG 


551 


AGCTTTTCTG 


GCAGGAAATA 


CTCGGTATGT 


G CAT CAT CAT 


CgTCAGCGGT 


601 


ATTTTGA 









This corresponds to the amino acid sequence <SEQ ID 540; ORF135>: 



1 . . GTGAMLLLFY AVTILPLATG VTLSYTSSIF LAVFSFLILK ERISVYTQAV 
51 LLLGFAGVVL LLNPSFRSGQ ETAALAGLAG GAMSGWAYLK VRELSLAGEP 
101 GWRWFYLSV TGVAMSSVWA TLTGWHTLSF PSAVYLSCIG VSALIAQLSM 
151 TRAYKVGDKF TVASLSYMTV VFSALSAAFF LGEELFWQEI LGMCIIISAV 
201 F* 

Further work revealed the complete nucleotide sequence <SEQ ID 54 1>: 



1 ATGGATACCG CAAAAAAAGA CATTTTAGGA TCGGGCTGGA TGCTGGTGGC 

51 GGCGGCCTGC TTTACCATTA TGAACGTATT GATTAAAGAG GCATCGGCAA 

101 AATTTGCCCT CGGCAGCGGC GAATTGGTCT TTTGGCGCAT GCTGTTTTCA 

151 ACCGTTGCGC TCGGGGCTGC CGCCGTATTG CGTCGGGACA mCTTCCGCAC 

201 GCCCCATTGG AAAAACCACT TAAACCGCAG TATGGTCGGG ACGGGGGCGA 

251 TGCTGCTGCT GTTTTACGCG GTAACGCATC TGCCTTTGGC CACTGGCGTT 

301 ACCCTGAGTT ACACCTCGTC GATTTTTTTG GCGGTATTTT CCTTCCTGAT 

351 TTTGAAAGAA CGGATTTCCG TTTACACGCA GGCGGTGCTG CTCCTTGGTT 

4 01 TTGCCGGCGT GGTATTGCTG CTTAATCCCT CGTTCCGCAG CGGTCAGGAA 

451 ACGGCGGCAC TCGCCGGGCT GGCGGGCGGC GCGATGTCCG GCTGGGCGTA 

501 TTTGAAAGTG CGCGAACTGT CTTTGGCGGG CGAACCCGGC TGGCGCGTCG 

551 TGTTTTACCT TTCCGTGACA GGTGTGGCGA TGTCGTCGGT TTGGGCGACG 

601 CTGACCGGCT GGCACACCCT GTCCTTTCCA TCGGCAGTTT ATCTGTCGTG 

651 CATCGGCGTG TCCGCGCTGA TTGCCCAACT GTCGATGACG CGCGCCTACA 

701 AAGTCGGCGA CAAATTCACG GTTGCCTCGC TTTCCTATAT GACCGTCGTT 

751 TTTTCCGCTC TGTCTGCCGC ATTTTTTCTG GGCGAAGAGC TTTTCTGGCA 

801 GGAAATACTC GGTATGTGCA TCATCATCCT CAGCGGTATT TTGAGCAGCA 

851 TCCGCCCCAC TGCCTTCAAA CAGCGGCTGC AATCCCTGTT CCGCCAAAGA 

901 TAA 

This corresponds to the amino acid sequence <SEQ ID 542; ORF135-l>: 



1 MDTAKKDILG SGWMLVAAA C FTIMNVLIKE ASAKFALGSG ELVFWRMLFS 

51 TVALGAAAVL RRDXFRTPHW KKHLNRS MVG TGAMLLLFYA VTHL PLATGV 

101 T LSYTSSIFL AVFSFLIL KE RISVYTQA VL LLGFAGWLL LNPSF RSGQE 

151 TAALAGLAGG AMSGWAYLKV RELSLAGEPG WRWFYLSVT GVAMSSVWAT 

201 LTGWHTLS FP SAVYLSCIGV SALIA QLSMT RAYKVGDKFT VAS LSYMTW 

251 FSALSAAFFL GEELFWQ SIL GMCIIIL5GI LSSI RPTAFK QRLQSLFRQR 

301 * 

Computer analysis of this amino acid sequence gave the following results: 
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Homologv with a predicted QRF from N. meningitidis (strain A) 

ORF135 shows 99.0% identity over a 197aa overlap with an ORF (ORF135a) from strain A of N. 
meningitidis: 



orfl35.pep 



GTGAMLLLFYAVTILPLATGVTLSYTSSIF 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
STVALGAAAVLRRDT FRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLSYTSSIF 



> LAVFSFLILKERISVYTQAVLLLGFAGWLLLNPSFRSGQETAALAGLAGGAMSGWAYLK 

II I I I I I I I I I I I I I I I I I I : I 

LAVFSFLILKERISVYTQAVLLLGFAGWLLLNPSFRSGQETAALAGLAGGAMSGWAYLK 
110 120 130 140 150 160 



100 110 120 130 140 150 

orf 135 . pep VRELSLAGEPGWRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSM 

orf 135a VRELSLAGEPGWRVVFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSM 
170 180 190 200 210 220 



160 170 180 190 200 

orf 135. pep TRAYKVGDKFTVASLSYMTVVFSALSAAFFLGEELFWQEILGMCIIISAVFX 

orf 135a TRAYKVGDKFTVASLSYMTVVFSALSAAFFLAEELFWQEILGMCIIILSGILSSIRPTAF 
230 240 250 260 270 280 



orfl35a KQRLQS LFRQRX 
290 300 

The complete length ORF135a nucleotide sequence <SEQ ID 543> is: 

1 ATGGATACCG CAAAAAAAGA CATTTTAGGA TCGGGCTGGA TGCTGGTGGC 

51 GGCGGCCTGC TTTACCATTA TGAACGTATT GATTAAAGAG GCATCGGCAA 

101 AATTTGCCCT CGGCAGCGGC GAATTGGTCT TTTGGCGCAT GCTGTTTTCA 

151 ACCGTTGCGC TCGGGGCTGC CGCCGTATTG CGTCGGGACA CCTTCCGCAC 

201 GCCCCATTGG AAAAACCACT TAAACCGCAG TATGGTCGGG ACGGGGGCGA 

251 TGCTGCTGCT GTTTTACGCG GTAACGCATC TGCCTTTGGC CACCGGCGTT 

301 ACCCTGAGTT ACACCTCGTC GATTTTTTTG GCGGTATTTT CCTTCCTGAT 

351 TTTGAAAGAA CGGATTTCCG TTTACACGCA GGCGGTGCTG CTCCTTGGTT 

401 TTGCCGGCGT GGTATTGCTG CTTAATCCCT CGTTCCGCAG CGGTCAGGAA 

451 ACGGCGGCAC TCGCCGGGCT GGCGGGCGGC GCGATGTCCG GCTGGGCGTA 

501 TTTGAAAGTG CGCGAACTGT CTTTGGCGGG CGAACCCGGC TGGCGCGTCG 

551 TGTTTTACCT TTCCGTGACA GGTGTGGCGA TGTCATCGGT TTGGGCGACG 

601 CTGACCGGCT GGCACACCCT GTCCTTTCCA TCGGCAGTTT ATCTGTCGTG 

651 CATCGGCGTG TCCGCGCTGA TTGCCCAACT GTCGATGACG CGCGCCTACA 

701 AAGTCGGCGA CAAATTCACG GTTGCCTCGC TTTCCTATAT GACCGTCGTT 

751 TTTTCCGCTC TGTCTGCCGC ATTTTTTCTG GCCGAAGAGC TTTTCTGGCA 

801 GGAAATACTC GGTATGTGCA TCATCATCCT CAGCGGTATT TTGAGCAGCA 

851 TCCGCCCCAC TGCCTTCAAA CAGCGGCTGC AATCCCTGTT CCGCCAAAGA 

901 TAA 

This encodes a protein having amino acid sequence <SEQ ID 544>: 



1 MDTAKKDILG SGWMLVAAA C FTIMNVLIKE ASAKFALGSG ELVFWRMLFS 

51 TVALGAAAVL RRDTFRTPHW KNHLNRS MVG TGAMLLLFYA VTHL PLATGV 

101 T LSYTSSIFL AVFSFLIL KE RISVYTQA VL LLGFAGWLL LNPSF RSGQE 

151 TAALAGLAGG AMSGWAYLKV RELSLAGEPG WRWFYLSVT GVAMSSVWAT 

201 LTGWHTLS FP SAVYLSCIGV SALIA CLSMT RAYKVGDKFT VAS LSYMTW 

251 FSALSAAFFL AEELFWQ EIL GMCIIILSGI LSSI RPTAFK QRLQSLFRQR 

301 * 

ORF135a and ORF135-1 show 99.3% identity in 300 aa overlap: 



orf 135a. pep MDTAKKDILGSGWMLVAAACFTIMNVLIKEASAKFALGSGELVFWRMLFSTVALGAAAVL 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I M I I I I I I I 
orf 135-1 MDTAKKDILGSGWMLVAAACFTIMNVLIKEASAKFALGSGELVFWRMLFSTVALGAAAVL 
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orf 135a. pep RRDTFRT PHWKNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLSYTSSIFLAVFSFLILKE 

orf 135-1 RRDXFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLSYTS3IFLAVFSFLILKE 

orf 135a . pep RISVYTQAVLLLGFAGVVLLLNPSFRSGQETAALAGLAGGAMSGWAYLKVRELSLAGEPG 

orf 135-1 RISVYTQAVLLLGFAGWLLLNPSFRSGQETAALAGLAGGAMSGWAYLKVRELSLAGEPG 

orf 135a . pep WRVVFYL3VTGVAM3SVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSMTRAYKVGDKFT 

orf 135-1 WRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSMTRAYKVGDKFT 

orf 135a. pep VASLSYMTVVFSALSAAFFLAEELFWQEILGMCII ILSGILSSIRPTAFKQRLQSLFRQR 

I I I I I I I I I I I I I I I I I I I I: I I I I I I I I I I I I I I I I I I I I I I I I I I II 

orf 135-1 VASLSYMTWFSALSAAFFLGEELFWQEILGMCI I ILSGILSSIRPTAFKQRLQSLFRQR 

Homology with a predicted ORF from N. gonorrhoeae 

ORF 135 shows 97% identity over a 201aa overlap with a predicted ORF (ORF135ng) from 
N. gonorrhoeae: 

orfl35.pep 
orf 135ng 
orf 135 .pep 
orf 135ng 
orf 135 .pep 
orf 135ng 
orf 135 .pep 
orfl35ng 

An ORF1 35ng nucleotide sequence <SEQ ID 545> was predicted to encode a protein having a 
acid sequence <SEQ ID 546>: 



GTGAMLLLFYAVTXLPLATGVTLS YTS S I F 

STVTLGAAAVLRRDT FRT PHWKNHLNRSMVGTGAMLLLFYAVTHLPLTTGVTLS YTS S I F 

LAVFSFLILKERISVYTQAVLLLGFAGWLLLNPSFRSGQETAALAGLAGGAMSGWAYLK 

LAVFSFLILKERISVYTQAVLLLGFAGWLLLNPSFRSGQEPAALAGLAGGAMSGWAYLK 

VRELSLAGEPGWRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSM 

VRELSLAGEPGWRWFYLSATGVAMSSVWATLTGWHTLSFPSAVYLSGIGVSALIAQLSM 

TRAYKVGDKFTVASLS YMTWFSALSAAFFLGEELFWQE I LGMC III SAVF 201 
I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I : I 
TRAYKVGDKFTVASLSYMTWFSALSAAFFLGEELFWQEI LGMCI 1 1 SAAF 506 



1 MPSEKAFRRH 

51 ILDIQLGLFR 

101 NLGHFTDTHL 

151 FRQCGHINRL 

201 QKQAKTHSTS 

2 51 NVLI KEASAK 

301 NRS MVGTGAM 

351 YTQ AVLLLGF 

4 01 LAGEPGWRVV 

451 AQLSMTRAYK 

501 I I SAAF * 



LRTASFQGLH 
IDFAALAVYR 
IAQARRFIAD 
APGKDCRNGK 
LAARFTIRPS 
FALGSGELVF 
LLLFYAVTHL 



AGWLLLNPS 
FYLSATGVAM 
VGDKFTVASL 



LHHFHQKV3K 
RTQVDFIHTV 
FGNIRPMRRG 
RDKVFFHTRH 
LSQRPFMDTA 
WRMLFSTVTL 
PLTTGVT LSY 
FRSGQEPAAL 
SSVWATLTGW 
SYMTVVFSAL 



CGIIGFGIHI 
IDGIASDQAF 
EAKTFCRCFR 
YNQVCLEKTN 
KKDILGS GWM 
GAAAVLRRDT 
TSSIFLAVFS 
AGLAGGAMSG 
HTLSFPSAVY 



FPTLLPA AQG 
SEWQILRRL 
FDGIDGIHGD 
CSARKIKFRH 
LVAAACFTVM 



FRT PHWKNHL 
FLIL KERISV 
WAYLKVRELS 
LSGIGVSALI 



SAAFFLGEEL FWQEILGMCI 



Further work revealed the following gonococcal sequence <SEQ ID 547>: 



1 ATGGATACCG CAAAAAAAGA CATTTTAGGA TCGGGCTGGA TGCTGGTGGC 

51 GGCGGCCTGC TTCACCGTTA TGAACGTATT GATTAAAGAG GCATCGGCAA 

101 AATTTGCCCT CGGCAGCGGC GAATTGGTCT TTTGGCGCAT GCTGTTTTCA 

151 ACCGTTACGC TCGGTGCTGC CGCCGTATTG CGGCGCGACA CCTTCCGCAC 

2 01 GCCCCATTGG AAAAACCACT TAAACCGCAG TATGGTCGGG ACGGGGGCGA 

251 TGCTGCTGCT GTTTTACGCG GTAACGCATC TGCCTTTGAC AACCGGCGTT 

301 ACCCTGAGTT ACACCTCGTC GATTTTTttg GCGGTATTTT CCTTCCTGAT 

351 TTTGAAAGAA CGGATTTCCG TTTACACGCA GGCGGTGCTG CTCCTTGGTT 

401 TTGCCGGCGT GGTATTGCTG CTTAATCCCT CGTTCCGCAG CGGTCAGGAA 

451 CCGGCGGCAC TCGCCGGGCT GGCGGGCGGC GCGATGTCCG GCTGGGCGTA 

501 TTTGAAAGTG CGCGAACTGT CTTTGGCGGG CGAACCCGGC TGGCGCGTCG 

551 TGTTTTACCT TTCCGCAACC GGCGTGGCGA TGTCGTCggt ttgggcgacg 

601 Ctgaccggct ggCACAcccT GTCCTTTcca tcggcagttt ATCtgtCGGG 
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651 CATCGGCGTG tccgcgCtgA TTGCCCAaCT GtcgatgAcg cGCGcctaca 

7 01 aaGTCGGCGA CAAATTCACG GTTGCCTCGC tttcctaTAt gaccgtcGTC 

7 51 TTTTCCGCCC TGTCTGCCGC ATTTTTTCTg ggcgaagagc tttTCtggCA 

801 GGAAATACTC GGTATGTGCA TCATTAtccT CAGCGGCATT TTGAGCAGCA 

851 TCCGCCCCAT TGCCTTCAAA CAGCGGCTGC AAGCCCTCTT CCGCCAAAGA 

901 TAA 

This corresponds to the amino acid sequence <SEQ ID 548; ORF135ng-l>: 



1 MDTAKKDILG SGWMLVAAA C ETVMNVLIKE ASAKFALGSG ELVFWRMLFS 

51 TVTLGAAAVL RRDTFRTPHW KNH1NRS MVG TGAMLLLFYA VTHL PLTTGV 

101 T LSYTSSIFL AVFSFLIL KE RISVYTQ AVL LLGFAGVVLL LNPSF RSGQE 

151 PAALAGLAGG AMSGWAYLKV RELSLAGSPG WRWFYLSAT GVAMSSVWAT 

201 LTGWHTLS FP SAVYLSGIGV SALIA QLSMT RAYKVGDKFT VAS LSYMTW 

251 FSALSAAFFL GEELFWQ EIL GMCIIILSGI LSSI RPIAFK QRLQALFRQR 

301 * 

ORF135ng-l and ORF135-1 show 97.0% identity in 300 aa overlap: 



orfl35ng-l.pep MDTAKKDILGSGWMLVAAACFTVMNVLIKEASAKFALGSGELVFWRMLFSTVTLGAAAVL 

I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I : I 

orf 135-1 MDTAKKDILGSGWMLVAAACFTIMNVLIKEASAKFALGSGELVFWRMLFSTVALGAAAVL 

orf 135ng-l . pep RRDTFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLTTGVTLSYTSSIFLAVFSFLILKE 

111:11 M I I Ml I : I II 

orf 135-1 RRDXFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLSYTSSIFLAVFSFLILKE 

orf 135ng-l . pep RISVYTQAVLLLGFAGWLLLNPSFRSGQEPAALAGLAGGAMSGWAYLKVRELSLAGEPG 
I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
or f 1 3 5 - 1 RI S VYTQAVLLLGFAGVVLLLNPS FRSGQETAALAGLAGGAMSGWAYLKVRELSLAGE PG 

orf 135ng-l . pep WRVVFYLSATGVAMSSVWATLTGWHTLSFPSAVYLSGIGVSALIAQLSMT RAYKVGDKFT 

I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 135-1 WRVVFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSMTRAYKVGDKFT 

orf 135ng-l . pep VASLSYMTVVFSALSAAFFLGEELFWQEILGMCIIILSGILSSIRPIAFKQRLQALFRQR 

I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 111:11111 

orf 135-1 V AS L S YMT VV FS AL SAAFFLGEE L FWQE I LGMC IIILSGILSSI RPTAFKQRLQ S L FRQR 

Based on this analysis, including the presence of several putative transmembrane domains in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 66 



The following DNA sequence was identified in N. meningitidis <SEQ ID 549>: 

1 ATGAAGCGGC GTATAGCCGT CTTCGTCCTG TTCCCGCAGA TAATCCGAGT 

51 TTTGGGACAA CTGTTGCCGA AAATCGTCAA TACAGTTCCG GCACATCGGA 

101 TGCTCTTCCA GATTTTCGGG ATGTTCTTTT TCTTCATACA CCAGCAATAT 

151 CTGCCCGGGA TCGCCGAAAT CGATTCCCCA TGCGGCATCG TGTTCGGTGC 

201 GCTCCTCTTC CGTCATCTGC CCGCGCATTG CCTGTATGGT AAAGCCGCCG 

251 TAGGGGATGC CgTTGCACAC GAACATCCAG TCGCTGATGT CGTCAACCGG 

301 AACGCAAACG cTTTCGCCTT GTTCGACATT GGTCAGTTCG CCsGGTTCAT 

351 TGTTCAGCAC ACCGTAAATA TAAAGACCGT CAAAATAAAT ATCGTCGATC 

401 CACATATGTT CGCAAATTTC GCCGTCTTCG CCGTCTTGGA AAAAAGGGAC 

4 51 TTTGACCATG GCAAAATCCA AGGCGGAAAT AATGCGGCGG CGTTCCCAAA 

501 AAAGcTCGCG CCAAAAATAT TTGAATGTTT TACGGGCGCG TTCGTCGGCA 

551 CGGTTTACCG GTTCGTCTGC CTGTTCTACA TAATAAATGA CGGAATCGCC 

601 CATCATATCT GCTCCTCAAC GTGTACGGTA TCTGTTTGCA CCTTACTGCG 

651 GCTTTCTgcC kTCGGCATCC GATTCGGATT TGAAAAGTTC mmrwyATTCG 

7 01 GAATAG 

This corresponds to the amino acid sequence <SEQ ID 550; ORF136>: 



1 MKRRIAVFVL FPQIIRVLGQ LLPKIVNTVP AHRMLFQIFG MFFFFIHQQY 
51 LPGIAEIDSP CGIVFGALLF RHLPAHCLYG KAAVG DAVAH EHPVADWNR 
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101 NANAFALFDI GQFAXFIVQH TVNIKTVKIN IVDPHMFANF AVFAVLEKRD 
151 FDHGKIQGGN NAAAFPKKLA PKIFECFTGA FVGTVYRFVC LFYI INDGIA 
201 HHSAPQRVRY LFAPYCGFLP SASDSDLKSS XXSE* 

Further work revealed the complete nucleotide sequence <SEQ ID 55 1>: 



1 ATGATGAAGC GGCGTATAGC CGTCTTCGTC CTGTTCCCGC AGATAATCCG 

51 AGTTTTGGGA CAACTGTTGC CGAAAATCGT CAATACAGTT CCGGCACATC 

101 GGATGCTCTT CCAGATTTTC GGGATGTTCT TTTTCTTCAT ACACCAGCAA 

151 TATCTGCCCG GGATCGCCGA AATCGATTCC CCATGCGGCA TCGTGTTCGG 

201 TGCGCTCCTC TTCCGTCATC TGCCCGCGCA TTGCCTGTAT GGTAAAGCCG 

251 CCGTAGGGGA TGCCGTTGCA CACGAACATC CAGTCGCTGA TGTCGTCAAC 

301 CGGAACGCAA ACGCTTTCGC CTTGTTCGAC ATTGGTCAGT TCGCCGGGTT 

351 CATTGTTCAG CACACCGTAA ATATAAAGAC CGTCAAAATA AATATCGTCG 

4 01 ATCCACATAT GTTCGCAAAT TTCGCCGTCT TCGCCGTCTT GGAAAAAAGG 

451 GACTTTGACC ATGGCAAAAT CCAAGGCGGA AATAATGCGG CGGCGTTCCC 

501 AAAAAAGCTC GCGCCAAAAA TATTTGAATG TTTTACGGGC GCGTTCGTCG 

551 GCACGGTTTA CCGGTTCGTC TGCCTGTTCT ACATAATAAA TGACGGAATC 

601 GCCCATCATT CTGCTCCTCA ACGTGTACGG TATCTGTTTG CACCTTACTG 

651 CGGCTTTCTG CCTTCGGCAT CCGATTCGGA TTTGAAAAGT TCCAAATATT 

7 01 CGGAATAG 

This corresponds to the amino acid sequence <SEQ ID 552; ORF136-l>: 



1 MMKRR IAVFV LFPQIIRVLG QL LPKIVNTV FAHRMLFQIF GMFFFFIHQQ 

51 YLPGIAEIDS PCGIVFGALL FRHLPAHCLY GKAAVG DAVA HEHPVADWN 

101 RNANAFALFD IGQFAGFIVQ HTVNIKTVK: NIVDPHMF7iN FAVFAVLEKR 

151 DFDHGKIQGG NNAAAFPKKL APKIFECFT G AFVGTVYRFV CLFYII NDGI 

201 AHHSAPQRVR YLFAPYCGFL PSASDSDLKS SKYSE* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted QRF from N. meningitidis (strain A) 

ORF136 shows 71.7% identity over a 237aa overlap with an ORF (ORF136a) from strain A of JV. 

meningitidis: 

10 20 30 40 50 59 

orfl36.pep MKRRIAVFVLFPQIIRVLGQLLPKIVNTVPAHRMLFQIFGMFFFFIHQQYLPGIAEIDS 
I I I I I I I I I I : I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I i I I I I 

orf 136a MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQXFGMFFFFIHQQYLPGIAEIDS 
10 20 30 40 50 60 



60 70 80 90 100 110 119 
orf 13 6. pep PCGIVFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADWNRNANAFALFDIGQFAXFIVQ 
: : I I I I : I INI MM 

orf 136a PCGIVFGTLLFRHXSTHCLYGKAAVGNAVAHEHPVADVVNRNANAFALFDIGQFAGFIVQ 
70 80 90 100 110 120 



120 130 140 150 160 170 179 

orf 13 6. pep HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 

orf 13 6a HAINVKTVKINIVDPHMFANFAXFAVLEKRALTMAKSKXXXMRRRSQKSSRQKYLNVLRA 
130 140 150 160 170 180 



180 190 200 210 220 230 

orf 13 6. pep AFVGTVYRFVCLFYIINDGIAHH SAPQRVRYLFAPYCGFLPSAS DS DLKS SXXSEX 

orf 13 6a R SPARFTGLSACSTXXMTESPIISAPQRVRYLFAPYCGFLPSASDSDLKSSKYSEX 

190 200 210 220 230 

The complete length ORF136a nucleotide sequence <SEQ ID 553> is: 



1 ATGATGAAGC GGCGTATAGC CGTCTTCGTC CTGCTCATGC AGAAAATCCG 

51 GATTTTGGGA CAACTGTTGC CGAAAATCGT CAATACAGTT CCGGCACATC 

101 GGATGCTCTT CCAGATNTTC GGGATGTTCT TTTTCTTCAT ACACCAGCAA 

151 TACCTGCCCG GGATCGCCGA AATCGATTCC CCATGCGGCA TCGTGTTCGG 

201 TACGCTCCTC TTCCGTCATC NGTCCACGCA TTGCCTGTAT GGTAAAGCCG 

251 CCGTAGGGAA TGCCGTTGCA CACGAACATC CAGTCGCTGA TGTCGTCAAC 
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CGGAACGCAA ACGCTTTCGC CTTGTTCGAC ATTGGTCAGT TCGCCGGGTT 

CATTGTTCAG CACGCCATAA ATGTAAAGAC CGT CAAAAT A AATATCGTCG 

ATCCACATAT GTTCGCAAAT TTCGCCNTCT TCGCCGTCTT GGAAAAAAGG 

GCTTTGACCA TGGCAAAATC TAAGGNGNNA NNGATGCGGC GGCGTTCCCA 

AAAAAGCTCG CGCCAAAAAT ATTTGAATGT TTTGCGGGCG CGTTCGCCGG 

CACGGTTTAC CGGTTTGTCT GCCTGTTCTA CATAATAAAT GACGGAATCG 

CCCATCATAT CTGCTCCTCA ACGTGTACGG TATCTGTTTG CACCTTACTG 

CGGCTTTCTG CCTTCGGCAT CCGATTCGGA TTT GAAAAGT TCCAAATATT 
CGGAATAG 



10 This encodes a protein having amino acid sequence <SEQ ID 554>: 



MMKRR IAVFV LLMQKIRILG QL LPKIVNTV PAHRMLFQXF GMFFFFIHQQ 
YLPGIAEIDS PCGIVFGTLL FRHXSTKCLY GKAAVGNAVA HEHPVADWN 
RNANAFALFD IGQFAGFIVQ HAINVKTVKI NIVDPHMFAN FAXFAVLEKR 
ALTMAKSKXX XMRRRSQKSS RQKYLNVLRA RSPARFTGLS ACST**MTES 
PIISAPQRVR YLFAPYCGFL PSASDSDLKS SKYSE* 



ORF136a and ORF136-1 show 73.1% identity in 238 aa overlap: 



orf 136a . pep 
orfl36-l 



MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQXFGMFFFFIHQQYLPGIAEIDS 
I I I I I I I I I I I : I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
MMKRRIAVFVLFPQIIRVLGQLLPKIVNTVPAHRMLFQIFGMFFFFIHQQYLPGIAEIDS 



10 



20 



30 



40 



50 



60 



70 80 90 100 110 120 

PCGIVFGTLLFRHXSTHCLYGKAAVGNAVAHEHPVADWNRNANAFALFDIGQFAGFIVQ 

PCGIVFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADVVNRNANAFALFDIGQFAGFIVQ 
70 80 90 100 110 120 



130 



140 



150 



160 



no 



180 



HAINVKTVKINIVDPHMFANFAXFAVLEKRALTMAKSKXXXMRRRSQKSSRQKYLNVLRA 



orf 136a .pep 
orfl36-l 



AFVGTVYRFVCLFYI INDGIAHH-- 



Homology with a predicted ORF from N. gonorrhoeae 

ORF136 shows 92.3% identity over a 234aa overlap with a predicted ORF (ORF136ng) from 
N. gonorrhoeae: 

orf 136. pep MKRRIAVFVLFPQIIRVLGQLLPKIVNTVPAHRMLFQIFGMFFFFIHQQYLPGIAEIDS 59 

orfl36ng MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQIFGMFFFFIHRQYLPGIAEIDS 60 

orf 136 . pep PCGIVFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADVVNRNANAFALFDIGQFAXFIVQ 119 

orfl36ng PGGIVFGTLLFRHLSAHCLYGKAAVGDAVAHEHPVADVANRNANAFALFDIGQSAGFIVQ 120 

orf 136 .pep HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 179 

I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I i I 1 I I I : I I I I I I 

orfl36ng HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKVFECFTG 180 



orfl36ng 



AFVGTVYRFVCLFYI INDGIAHHSAPQRVRYLFAPYCGFLPSASDSDLKSSXXSE 
I I : I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I MM I I I II II II II 
AFAGTVYRFVCLFYIINDGIAHHTAPQRVRYLFAPYRGFLPPASDSDLKS SKYSE 



The complete length ORF136ng nucleotide sequence <SEQ ID 555> is: 



60 



1 ATGATGAAGC GGCGTATAGC CGTCTTCGTC CTGCTCATGC AGAAAATCCG 
51 GATTTTGGGA CAACTGTTGC CGAAAATCGT CAATACAGTT CCGGCACATC 
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GGATGCTCTT 
TACCTGCCCG 
TACGCTCCTC 
CCGTAGGGGA 
CGGAACGCAA 
CATTGTTCAG 
ATCCACATAT 
GACTTTGACC 
AAAAAAGCTC 
GCACGGTTTA 
GCCCATCATA 
CGGTTTTCTA 
CGGAATAG 



CCAAATTTTC 
GGATCGCCGA 
TTCCGTCATC 
TGCCGTTGCA 
ACGCTTTCGC 
CACACCGTAA 
GTTCGCAAAT 
ATGGCAAAAT 
GCGCCAAAAG 
CCGGTTCGTC 
CTGCTCCTCA 
CCTCCGGCAT 



GGGATGTTCT 
AATCGATTCC 
TGTCCGCGCA 
CACGAACATC 
CTTGTTCGAC 
ATATAAAGAC 
TTCGCCGTCT 
CCAAGGCGGA 
TATTTGAATG 
TGCCTGTTCT 
ACGTGTACGG 
CCGATTCGGA 



TTTTCTTCAT 
CCAGGCGGTA 
TTGCCTGTAC 
CAGTCGCTGA 
ATTGGTCAGT 
CGTCAAAATA 
TCGCCGTCTT 
AATAATGCGG 
TTTTACGGGC 
ACATAATAAA 
TATCTGTTTG 
TTTGAAAAGT 



ACACCGGCAA 
TCGTGTTCGG 
GGTAAAGCCG 
TGTCGCCAAC 
CCGCCGGGTT 
AATATCGTCG 
GGAAAAAAGG 
CGGCGTTCCC 
GCGTTCGCCG 
TGACGGAATC 
CACCTTACCG 
TCCAAATATT 



This encodes a protein having amino acid sequence <SEQ ID 556>: 



1 MMKRR IAVFV LLMQKIRILG QL LPKIVNTV PAHRMLFQIF GMFFFFIHRQ 

51 YLPGIAEIDS PGGIVFGTLL FRHLSAHCLY GKAAVGDAVA HEHPVADVAN 

101 RNANAFALFD IGQSAGFIVQ HTVNIKTVKI NIVDPHMFAN FAVFAVLEKR 

151 DFDHGKIQGG NNAAAFPKKL APKVFECFT G AFAGTVYRFV CLFYII NDGI 

201 AHHTAPQRVR YLFAPYRGFL PPASDSDLKS SKYSE* 

ORF136ng and ORF136-1 show 93.6% identity in 235 aa overlap: 



orfl36ng MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQIFGMFFFFIHRQYLPGIAEIDS 

orf 13 6-1 MMKRRIAVFVLFPQIIRVLGQLLPKIVNTVPAHRMLFQIFGMFFFFIHQQYLPGIAEIDS 

orfl36ng PGGIVFGTLLFRHLSAHCLYGKAAVGDAVAHEHPVADVANRNANAFALFD IGQSAGFIVQ 

I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I MINI 
orf 136-1 PCGIVFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADWNRNANAFALFDIGQFAGFIVQ 

orfl36ng HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKVFECFTG 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I 
orf 13 6-1 HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 

orfl36ng AFAGTVYREVCLFYIINDGIAHHTAPQRVRYLFAPYRGFLPPASDSDLKSSKYSEX 

I I: I I I I I I I I I I I I I I I I I I I I: Illllll I Illllllll 

orf 136-1 AFVGTVYRFVCLFYIINDGIAHHSAPQRVRYLFAPYCGFLPSASDSDLKSSKYSEX 

Based on the presence of the putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 67 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 557>: 

1 ATGGAAAATA TGGTAACGTT TTCAAAAATC AGACCGCTTT TGGCAATCGC 

51 CGCCGCCGCG TTGCTTGCCG CC.TGCGGAC GGCGGGAAAT AATGCTGTCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGG TTTGGCACTC 

151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT GTAGGTATTA TTAAGGTTTT 

201 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACC TCCGCAGGTT 

251 CGATTGTCGG CAACCTTTTT GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

301 TTGGAAGCCG AAATTTTAGG CAAAACCGAT TTGGTCGATT TAACCTTGTC 

351 CACCAATGGG TTTATCAAAG GCGCAAAGCT GCAAAATTAC ATCAACCGAA 

401 AACTCCGCGG CATGCAGATT CAGCAGTTTC CCATCAAATT TGCCGCC.. 

This corresponds to the amino acid sequence <SEQ ID 558; ORF137>: 

1 MENMVTFSKI RPLLAIAAAA LLAAXRTAGN NAVRKPVQTA KPAAWGLAL 

51 GGGASKGFAH VGIIKVLKEN GIPVKWTGT SAGSIVGNLF ASGMSPDRLE 

101 LEAEILGKTD LVDLTLSTNG FIKGAKLQNY INRKLRGMQI QQFPIKFAA. . 



Further work revealed the complete nucleotide sequence <SEQ ID 559>: 
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1 ATGGAAAATA TGGTAACGTT TTCAAAAATC AGACCGCTTT TGGCAATCGC 

51 CGCCGCCGCG TTGCTTGCCG CCTGCGGCAC GGCGGGAAAT AATGCTGTCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGG TTTGGCACTC 

151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT GTAGGTATTA TTAAGGTTTT 

2 01 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACA TCGGCAGGTT 

251 CGATTGTCGG CAGCCTTTTT GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

301 TTGGAAGCCG AAATTTTAGG CAAAACCGAT TTGGTCGATT TAACCTTGTC 

351 CACCAGTGGT TTTATCAAAG GCGAAAAGCT GCAAAATTAC AT CAACCGAA 

401 AAGTCGGCGG CAGGCAGATT CAGCAGTTTC CCATCAAATT TGCCGCCGTT 

4 51 GCTACTGATT TTGAAACCGG CAAGGCCGTC GCTTTCAATC AGGGGAATGC 

501 CGGGCAGGCT GTGCGCGCTT CCGCCGCCAT TCCCAATGTG TTCCAACCCG 

551 TTATCATCGG CAGGCATACA TATGTTGACG GCGGTCTGTC GCAGCCCGTG 

601 CCCGTCAGTG CCGCCCGGCG GCAGGGGGCG AATTTCGTGA TTGCCGTCGA 

651 TATTTCCGCC CGTCCGGGCA AAAACATCAG CCAAGGTTTC TTCTCTTATC 

7 01 TCGATCAGAC GCTGAACGTA ATGAGCGTTT CTGCGTTGCA AAATGAGTTG 

7 51 GGGCAGGCGG ATGTGGTTAT CAAACCGCAG GTTTTGGATT TGGGTGCAGT 

8 01 CGGCGGATTC GAT C AGAAAA AACGCGCCAT CCGGTTGGGT GAGGAGGCAG 
851 CACGTGCCGC ATTGCCTGAA ATCAAACGCA AACTGGCGGC ATACCGTTAT 
901 TGA 

This corresponds to the amino acid sequence <SEQ ID 560; ORF137-l>: 

1 MENMVTFSKI RPLLAIAAAA LLAA CGTAGN NAVRKPVQTA KPAAWGLAL 
51 GGGASKGFAH VGIIKVLKEN GIPVKWTGT SAGSIVGSLF ASGMSPDRLE 
101 LEAEILGKTD LVDLTLSTSG FIKGEKLQNY INRKVGGRQI QQFPIKFAAV 
151 ATDFETGKAV AFNQGNAGQA VRASAAIPNV FQPVIIGRHT YVDGGLSQPV 
201 PVSAARRQGA NFVIAVDISA RPGKNISQGF FSYLDQTLNV MSVSALQNEL 
251 GQADVVIKPQ VLDLGAVGGF DQKKRAIRLG EEAARAALPE IKRKLAAYRY 
301 * 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF137 shows 93.3% identity over a 149aa overlap with an ORF (ORF137a) from strain A of TV. 
meningitidis: 

10 20 30 40 50 60 

orf 137 . pep MENMVTFSKIRPLLAIAAAALLAAXRTAGNKAVRKPVQTAKPAAVVGLALGGGASKGFAH 

orfl37a MENMVTFSKIRPLLAIAAAALLAACGTAGNNAARKPVQTAKPAAWGLALGGGASKGFAH 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf 137. pep VGIIKVLKENGIPVKVVTGTSAGSIVGNLFASGMSPDRLELEAEILGKTDLVDLTLSTNG 

I : I I I I I 1111111:1 

orf 137a VGIIKVLKENGIPVKVVTGTSAGSIVGSLFASGMSPDRLELEAEILGKTDLVDLTLSTSG 

70 80 90 100 110 120 

130 140 149 

or f 137 . pep FIKGAKLQNYINRKLRGMQIQQFPIKFAA 
I I I I I t I I I I I I I : I = I I I I I I I I I I 
orf 137a FIKGEKLQNYINRKVGGRRIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 

130 140 150 160 170 180 

The complete length ORF137a nucleotide sequence <SEQ ID 561> is: 

1 ATGGAAAATA TGGTAACGTT TTCAAAAATC AGACCGCTTT TGGCAATCGC 

51 CGCCGCCGCG TTGCTTGCCG CCTGCGGCAC GGCGGGAAAT AATGCTGCCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGG TTTGGCACTC 

151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT GTAGGTATTA TTAAGGTTTT 

201 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACA TCGGCAGGTT 

251 CGATAGTCGG CAGCCTTTTT GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

301 TTGGAAGCCG AAATTTTAGG TAAAACCGAT TTGGTCGATT TAACCTTGTC 

351 CACCAGTGGT TTTATCAAAG GCGAAAAGCT GCAAAATTAC AT CAACCGAA 

401 AAGTCGGCGG CAGGCGGATT CAGCAGTTTC CCATCAAATT TGCCGCCGTT 

451 GCTACTGATT TTGAAACCGG CAAGGCCGTC GCTTTCAATC AAGGGAATGC 

501 CGGGCAGGCT GTGCGCGCTT CCGCCGCCAT TCCCAATGTG TTCCAACCCG 

551 TTATCATCGG CAGGCATACA TATGTTGACG GCGGTCTGTC GCAGCCCGTG 
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601 CCCGTCAGTG CCGCCCGGCG GCANGNNNNG NATNTCGTGA TTGCCGTCGA 

651 TATTTCCGCC CGTCCGAGCA AAAACATCAG CCAAGGCTTC TTCTCTTATC 

7 01 TCGATCAGAC GCTGAACGTA ATGAGCGTTT CCGCGTTGCA AAATGAGTTG 

7 51 GGGCAGGCGG ATGTGGTTAT CAAACCGCAG GTTTTGGATT TGGGTGCAGT 

801 CGGCGGATTC GAT CAGAAAA AACGCGCCAT CCGGTTGGGT GAGGAGGCAG 

851 CACGTGCCGC ATTGCCTGAA ATCAAACGCA AACTGGCGGC ATACCGTTAT 

901 TGA 

This encodes a protein having amino acid sequence <SEQ ID 562>: 

1 MENMVTFSKI RPLLAIAAAA LLAA CGTAGN NAARKPVQTA KPAAWGLAL 

51 GGGASKGFAH VGIIKVLKEN GIPVKWTGT SAGSIVGSLF ASGMSPDRLE 

101 LEAEILGKTD LVDLTLSTSG FIKGEKLQNY INRKVGGRRI QQFPIKFAAV 

151 ATDFETGKAV AFNQGNAGQA VRASAAIPNV FQPVIIGRHT YVDGGLSQPV 

201 PVSAARRXXX XXVIAVDISA RP5KNISQGF FSYLDQTLNV MSVSALQNEL 

251 GQADVVIKPQ VLDLGAVGGF DQKKRAIRLG EEAARAALPE I KRKLAAYRY 

301 * 

ORF137a and ORF137-1 show 97.3% identity in 300 aa overlap: 

orf 137a. pep MENMVTFSKIRPLLAIAAAALLAACGTAGNNAARKPVQTAKPAAVVGLALGGGASKGFAH 

orf 137-1 MENMVTFSKIRPLLAIAAAALLAACGTAGNNAVRKPVQTAKPAAWGLALGGGASKGFAH 

orf 137a. pep VGIIKVLKENGIPVKWTGTSAGSIVGSLFASGMSPDRLELEAEILGKTDLVDLTLSTSG 

orf 137-1 VGIIKVLKENGI PVKVVTGTSAGSIVGSLFASGMSPDRLELEAEILGKTDLVDLTLSTSG 

orf 137a . pep FIKGEKLQNYINRKVGGRRIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 

orf 137-1 FIKGEKLQNYINRKVGGRQIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 

orf 137a . pep FQPVIIGRHTYVDGGLSQPVPVSAARRXXXXXVIAVDISARPSKNISQGFFSYLDQTLNV 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I 

orf 137-1 FQPVIIGRHTYVDGGLSQPVPVSAARRQGANFVIAVDISARPGKNISQGFFSYLDQTLNV 

orf 137a . pep MSVSALQNELGQADWIKPQVLDLGAVGGFDQKKRAIRLGEEAARAALPEIKRKLAAYRY 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
orf 137-1 MSVSALQNELGQADVVIKPQVLDLGAVGGFDQKKRAIRLGEEAARAALPEIKRKLAAYRY 

Homology with a predicted ORF from N.zonorrhoeae 

ORF137 shows 89.9% identity over a 149aa overlap with a predicted ORF (ORF137ng) from 
N. gonorrhoeae: 

orf 137. pep MENMVTFSKIRPLLAIAAAALLAAXRTAGNNAVRKPVQTAKPAAWGLALGGGASKGFAH 60 

I I I I II : I I I I I I I I I I I I I I I I I : I I I I I I I I II I I I: I I I I I I I I I I I I I 

orfl37ng MENMVTFSKIRSFLAIAAAALLAACGTAGNNAARKPVQTAKPAAWALALGGGASKGFAH 60 

orf 137 .pep VGIIKVLKENGIPVECWTGTSAGSIVGNLFASGMSPDRLELEAEILGKTDLVDLTLSTNG 120 

: I I : I I I I I I I I I I I I I I I I I I I I I I I : I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I 
orf 137ng IGIVKVLKENGIPVKVVTGTSAGSIVGSLLASGMSPDRLELEAEILGKTDLVDLTLSTSG 120 

orf 137. pep FIKGAKLQNYINRKLRGMQIQQFPIKFAA 149 

orfl37ng FIKGEKLQNYINRKVGGRQIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 180 

The complete length ORF137ng nucleotide sequence <SEQ ID 563> is: 

1 AT GGAAAAT A TGGTAACGTT TTCAAAAATC AGATCATTTT TGGCAATCGC 

51 CGCCGCCGCG TTGCTTGCCG CCTGCGGTAC GGCGGGAAAC AATGCCGCCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGC TTTGGCACTC 

151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT ATAGGAATTG TTAAGGTTTT 

201 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACA TCGGCAGGTT 

251 CGATAGTCGG CAGCCTTTTG GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

301 TTGGAAGCCG AGATTTTAGG TAAAACCGAT TTAGTCGATT TAACCTTGTC 

351 CACCAGTGGT TTTATCAAAG GCGAAAAGCT GCAAAATTAC ATCAACCGAA 

4 01 AAGTCGGCGG CAGGCAGATT CAGCAGTTTC CCATCAAATT TGCCGCCGTT 

451 GCCACTGATT TTGAAACCGG CAAGGCCGTC GCTTTCAATC AAGGGAATGC 
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501 CGGGCAGGCG GTTCGTGCTT CCGCCGCCAT TCCCAATGTG TTCCAGCCAG 

551 TCATCATCGG CAGGCACAAA TATGTTGACG GCGGTCTGTC GCAGCCCGTG 

601 CCCGTCAGTG CCGCTCGGCG GCAGGGGGCG AATTTCGTGA TTGCCGTCGA 

651 TATTTCCGCA CGTCCGAGCA AAAATGTCGG TCAAGGTTTC TTCTCTTATC 

701 TCGATCAGAC GCTGAACGTG ATGAGCGTTT CCGTGTTGCA AAACGAGTTG 

751 gggcAGGCGG ATGTGGTTAT CAAACCGCag gtTTTGGATT TGGGTGCAGT 

801 CGGCGGATTC GATCAGAAAA AGCGCGCCAT CCGGTTGGGC GAGGAGGCAG 

851 CACGTGCCGC ATTGCCTGAA ATCAAACGCA AACTGGCGGC ATACCGTTAT 

901 TGA 

This encodes a protein having amino acid sequence <SEQ ID 564>: 



1 MENMVTFSK I RSFLAIAAAA LLAAC GTAGN NAARKPVQTA KPAAWALAL 

51 GGGASKGFAH IGIVKVLKEN GIPVKWTGT SAGSIVGSLL ASGMSPDRLE 

101 LEAEILGKTD LVDLTLSTSG FIKGEKLQNY INRKVGGRQI QQFPIKFAAV 

151 ATDFETGKAV AFNQGNAGQA VRASAAIPNV FQPVIIGRHK YVDGGLSQPV 

201 PVSAARRQGA NFVIAVDISA RPSKNVGQGF FSYLDQTLNV MSVSVLQNEL 

251 GQADVVIKPQ VLDLGAVGGF DQKKRAIRLG EEAARAALPE IKRKLAAYRY 

301 * 

ORF137ng and ORF137-1 show 96.0% identity in 300 aa overlap: 



orfl37ng MENMVTFSKIRSFLAIAAAALLAACGTAGNNAARKPVQTAKPAAVVALALGGGASKGFAH 

I II : I I I I I I I I I I I I I I : I I : I I I I I I I I I I I I I 

orf 137-1 MENMVTFSKIRPLLAIAAAALLAACGTAGNNAVRKPVQTAKPAAVVGLALGGGASKGFAH 

orfl37ng IGIVKVLKENGIPVKWTGTSAGSIVGSLLASGMSPDRLELEAEILGKTDLVDLTLSTSG 
: I I: I I I I I ! I I I I I I I I : I I I I I I I 1 II 

orf 137-1 VGIIKVLKENGIPVKWTGTSAGSIVGSLFASGMSPDRLELEAEILGKTDLVDLTLSTSG 

orfl37ng FIKGEKLQNYINRKVGGRQIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 

orf 137-1 FIKGEKLQNYINRKVGGRQIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 

orf 137ng FQPVIIGRHKYVDGGLSQPVPVSAARRQGANFVIAVDISARPSKNVGQGFFSYLDQTLNV 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I :: I I I I I I I I I I I I I 

orf 137-1 FQPVIIGRHTYVDGGLSQPVPVSAARRQGANFVIAVDISARPGKNISQGFFSYLDQTLNV 

orfl37ng MSVSVLQNELGQADVVIKPQVLDLGAVGGFDQKKRAIRLGEEAARAALPE IKRKLAAYRY 

orf 137 MSVSALQNELGQADWIKPQVLDLGAVGGFDQKKRAIRLGEEAARAALPE IKRKLAAYRY 

Based on the presence of a predicted prokaryotic membrane lipoprotein lipid attachment site 
(underlined) in the gonococcal protein, it is predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 68 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 565>: 

1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGcTG CCGCTTTCCT 

101 GTCTGCACAC GCTGGGAAAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA 

151 AAGGAAGACC GCGCGCGCAT CGTCGCCmAT ATGCGGCAGG CGGGTTTGAA 

201 CCCCGACCCC AAAACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAAGGCG 

251 GTTTGGAACT TGCCCCCGCG TTTTTCAGAA AACCGGAAGA CATAGAAACA 

301 ATGTTCAAAG CGGTACACGG CTGGGAACAT GTGCAGCAGG CTTTGGACAA 

351 ACACGAAGGG CTGCTATTC. . 

This corresponds to the amino acid sequence <SEQ ID 566; ORF138>: 



1 MFRLQFRLFP PLRTAMHILL TALLKCLSLL PLSCLHTLGN RLGHLAFYLL 
51 KEDRARIVAX MRQAGLNPDP KTVKAVFAET AKGGLELAPA FFRKPEDIET 
101 MFKAVHGWEH VQQALDKHEG LLF 



WO 99/24578 



-326- 



PCT/IB98/01665 



Further work revealed the complete nucleotide sequence <SEQ ID 567>: 

1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGCTG CCGCTTTCCT 

101 GTCTGCACAC GCTGGGAAAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA 

151 AAGGAAGACC GCGCGCGCAT CGTCGCCAAT ATGCGGCAGG CGGGTTTGAA 

201 CCCCGACCCC AAAACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAAGGCG 

251 GTTTGGAACT TGCCCCCGCG TTTTTCAGAA AACCGGAAGA CATAGAAACA 

301 ATGTTCAAAG CGGTACACGG CTGGGAACAT GTGCAGCAGG CTTTGGACAA 

351 ACACGAAGGG CTGCTATTCA TCACGCCGCA CATCGGCAGC TACGATTTGG 

401 GCGGACGCTA CATCAGCCAG CAGCTTCCGT TCCCGCTGAC CGCCATGTAC 

451 AAACCGCCGA AAATCAAAGC GATAGACAAA ATCATGCAGG CGGGCAGGGT 

501 TCGCGGCAAA GGAAAAACCG CGCCTACCAG CATACAAGGG GTCAAACAAA 

551 TCATCAAAGC CCTGCGTTCG GGCGAAGCAA CCATCGTCCT GCCCGACCAC 

601 GTCCCCTCCC CTCAAGAAGG CGGGGAAGGC GTATGGGTGG ATTTCTTCGG 

651 CAAACCTGCC TATACCATGA CGCTGGCGGC AAAATTGGCA CACGTCAAAG 

7 01 GCGTGAAAAC CCTGTTTTTC TGCTGCGAAC GCCTGCCTGG CGGACAAGGT 

7 51 TTCGATTTGC ACATCCGCCC CGTCCAAGGG GAATTGAACG GCGACAAAGC 
801 CCATGATGCC GCCGTGTTCA ACCGCAATGC CGAATATTGG ATACGCCGTT 

8 51 TTCCGACGCA GTATCTGTTT ATGTACAACC GCTACAAAAT GCCGTAA 

This corresponds to the amino acid sequence <SEQ ID 568; ORF138-l>: 

1 MFRLQFRLFP PLRTAMH ILL TALLKCLSLL PLSC LHTLGN RLGHLAFYLL 

51 KEDRARIVAN MRQAGLNPDP KTVKAVFAET AKGGLELAPA FFRKPEDIET 

101 MFKAVHGWEH VQQALDKHEG LLFITPHIGS YDLGGRYISQ QLPFPLTAMY 

151 KPPKIKAIDK IMQAGRVRGK GKTAPTSIQG VKQIIKALRS GEATIVLPDH 

201 VPSPQEGGEG VWVDFFGKPA YTMTLAAKLA HVKGVKTLFF CCERLPGGQG 

251 FDLHIRPVQG ELNGDKAHDA AVFNRNAEYW IRRFPTQYLF MYNRYKMP* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from A J .meningitidis (strain A) 

ORF138 shows 99.2% identity over a 123aa overlap with an ORF (ORF1 38a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orfl38.pep MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAX 

orfl38a MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 
10 20 30 40 50 60 



orf 138 . pep MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 
orfl38a MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 



orf 138. pep LLF 

orf 138a LLFITPHIGS YDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 

130 140 150 160 170 180 

The complete length ORF138a nucleotide sequence <SEQ LD 569> is: 

1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGCTG CCGCTTTCCT 

101 GTCTGCACAC GCTGGGAAAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA 

151 AAGGAAGACC GCGCGCGCAT CGTCGCCAAT ATGCGTCAGG CAGGCATGAA 

201 TCCCGACCCC AAAACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAAGGCG 

251 GTTTGGAACT TGCCCCCGCG TTTTTCAGAA AACCGGAAGA CATAGAAACA 

301 ATGTTCAAAG CGGTACACGG CTGGGAACAT GTGCAGCAGG CTTTGGACAA 

351 ACACGAAGGG CTGCTATTCA TCACGCCGCA CATCGGCAGC TACGATTTGG 

401 GCGGACGCTA CATCAGCCAG CAGCTTCCGT TCCCGCTGAC CGCCATGTAC 

451 AAACCGCCGA AAATCAAAGC GATAGACAAA ATCATGCAGG CGGGCAGGGT 

501 TCGCGGCAAA GGAAAAACCG CGCCTACCAG CATACAAGGG GTCAAACAAA 

551 TCATCAAAGC CCTGCGTTCG GGCGA^GCA^ CCATCGTCCT GCCCGACCAC 
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601 GTCCCCTCCC CTCAAGAAGG CG3GGAAGGC GTATGGGTGG ATTTCTTCGG 

651 CAAACCTGCC TATACCATGA CGCTGGCGGC AAAATTGGCA CACGTCAAAG 

701 GCGTGAAAAC CCTGTTTTTC TGCTGCGAAC GCCTGCCTGG CGGACAAGGT 

751 TTCGATTTGC ACATCCGCCC CGTCCAAGGG GAATTGAACG GCGACAAAGC 

801 CCATGATGCC GCCGTGTTCA ACCGCAATGC CGAATATTGG ATACGCCGTT 

851 TTCCGACGCA GTATCTGTTT ATGTACAACC GCTACAAAAT GCCGTAA 

This encodes a protein having amino acid sequence <SEQ ED 570>: 

1 MFRLQFRLFP PLRTAMH ILL TALLKCLSLL PLS CLHTLGN RLGHLAFYLL 

51 KEDRARIVAN MRQAGLNPDP KTVKAVFAET AKGGLELAPA FFRKPEDIET 

101 MFKAVHGWEH VQQALDKHEG LLFITPHIGS YDLGGRYISQ QLPFPLTAMY 

151 KPPKIKAIDK 1MQAGRVRGK GKTAPTSIQG VKQIIKALRS GEATIVLPDH 

201 VPSPQEGGEG VWVDFFGKPA YTMTLAAKLA HVKGVKTLFF CCERLPGGQG 

251 FDLHIRPVQG ELNGDKAHDA AVFNRNAEYW IRRFPTQYLF MYNRYKMP* 

ORF138a and ORF138-1 show 99.7% identity over a 298aa overlap: 

orf 138a . pep MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 
I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 138-1 MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 

orf 138a . pep MRQAGMNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 

Mill: Ill I I I . I I I I I I I I II I I I I 

orf 138-1 MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 

orf 138a . pep LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 
orf 138-1 LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 

orf 138a . pep VKQIIKALRSGEATIVLPDHVPSPQEGGEGVWVDFFGKPAYTMTLAAKLAHVKGVKTLFF 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 138-1 VKQIIKALRSGEATIVLPDHVPSPQEGGEGVWVDFFGKPAYTMTLAAKLAHVKGVKTLFF 

ori : 138a . pep CCERLPGGQGFDLHIRPVQGELNGDKAHDAAVFNRNAEYWIRRFPTQYLFMYNRYKMP 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I 
orf 138-1 CCERLPGGQGFDLHIRPVQGELNGDKAHDAAVFNRNAEYWIRRFPTQYLFMYNRYKMP 

Homology with a predicted ORF from N. gonorrhoeae 

ORF138 shows 94.3% identity over a 123aa overlap with a predicted ORF (ORF138ng) from 
N. gonorrhoeae: 



orf 138 .pep 
orf 138ng 
orf 138 .pep 
orf 138ng 
orfl38.pep 
orf 138ng 



MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAX 

I I I I I I II I I I I I I I I II I I I I I I I I I I I I 

MFRLQFRLFPPLRTAMHILLTALLKCLSLLSLSCLHTLGNRLGHLAFYLLKEDRARIVAN 

MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 
I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I II I I I I I I II 
MRQAGLNPDTQTVKAVFAETAKCGLELAPAFFKKPEDIETMFKAVHGWEHVQQALDKGEG 



LLFITPHIGSYDLGGRYISQQLPFHLTAKYKPPKIKAIDKIMQAGRVRGKGKTAPTGIQG 180 



The complete length ORF138ng nucleotide sequence <SEQ ID 571> 



ATGTTTCGTT 
CATCCTGTTG 
GTCTGCACAC 
AAGGAAGACC 
CCCCGACACG 
GTTTGGAACT 
ATGTTCAAAG 
GGGCGAAGGG 
GCGGACGCTA 
AAGCCGCCGA 
GCGCGGCAAA 
tcatcaAGGC 



TACAATTCAG 
ACCGCCCTGC 
GCTGGGAAAC 
GCGCGCGCAT 
CAGACGGTCA 
TGCCCCCGCG 
CGGTACACGG 
CTGCTGTTCA 
CATCAGCCAG 
AAATCAAAGC 
GGCAAAACcg 
CCTGCGCGCG 



GCTGTTTCCC 
TCAAATGCCT 
CGGCTCGGAC 
CGTCGCCAAT 
AAGCCGTTTT 
TTTTTCAAAA 
CTGGGAACAC 
TCACGCCGCA 
CAGCTTCCGT 
GATAGACAAA 
cgcccaccgg 
GGCGAGGCAA 



CCTTTGCGAA 
CTCCCTGCTG 
ATCTGGCGTT 
ATGCGGCAGG 
TGCGGAAACG 
AACCGGAAGA 
GTGCAGCAGG 
CATCGGCAGC 
TCCACCTGAC 
ATCATGCAGG 
catACAAGGG 
CCAtcATCCT 



CCGCCATGCA 
TCGCTTTCCT 
TTACCTTTTA 
CGGGTTTGAA 
GCAAAATGCG 
CATCGAAACA 
CTTTGGACAA 
TACGATTTGG 
CGCCATGTAC 
CGGGCAGGGT 
GTCAAACAAA 
GCCCGACCAC 
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601 GTCCCTTCTC CGCAGGAagg cggCGGCGTG TGGGCGGATT TTTTCGGCAA 

651 ACCTGCATAC acCATGACAC TGGCGGCAAA ATTGGCACAC GTCAAAGGCG 

701 TGAAAACCCT GTTTTTCTGC TGCGAACGCC TGCCCGACGG ACAAGGCTTC 

751 GTGTTGCACA TCCGCCCCGT CCAAGGGGAA TTGAACGGCA ACAAAGCCCA 

B01 CGATGCCGCC GTGTTCAACC GCAATACCGA ATATTGGATA CGCCGTTTTC 

851 CGACGCAGTA TCTGTTTATG TACAACCGCT ATAAAACGCC GTAA 

This encodes a protein having amino acid sequence <SEQ ID 572>: 

1 MFRLQFRLFP PLRTAMH ILL TALLKCLSLL SLSC LHTLGN RLGHLAFYLL 

51 KEDRARIVAN MRQAGLKPDT QTVKAVFAET AKCGLELAPA FFKKPEDIET 

101 MFKAVHGWEH VQQALDKGEG LLFITPHIGS YDLGGRYISQ QLPFHLTAMY 

151 KPPKIKAIDK IMQAGRVRGK GKTAPTGIQG VKQIIKALRA GEATIILPDH 

201 VPSPQEGGGV WADFFGKPAY TMTLAAKLAH VKGVKTLFFC CERLPDGQGF 

251 VLHIRPVQGE LNGNKAHDAA VFNRNTEYWI RRFPTQYLFM YNRYKTP* 

ORF138ng and ORF138-1 show 94.3% identity over 299aa overlap: 

orf 138-1. pep MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 
! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 138ng MFRLQFRLFPPLRTAMHILLTALLKCLSLLSLSCLHTLGNRLGHLAFYLLKEDRARIVAN 

orf 138-1 . pep MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 

orfl38ng MRQAGLNPDTQTVKAVFAETAKCGLE1APAFFKKPEDIETMFKAVHGWEHVQQALDKGEG 

orf 138-1. pep LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 

I I I I I I I I I I I I I I I I I I I ! I I : I I I 

orfl38ng LLFITPHIGSYDLGGRYISQQLPFHLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTGIQG 

orf 138-1. pep VKQIIKALR3GEATIVLPDHVPSPQEGGEGVWVDFFGKPAYTMTLAAKLAHVKGVKTLFF 

orfl38ng VKQIIKALRAGEATIILPDHVPSPQEGG-GVWADFFGKPAYTMTLAAKLAHVKGVKTLFF 

orf 138-1. pep CCERLPGGQGFDLHIRPVQGELNGDKAHDAAVFNRNAEYWIRRFPTQYLFMYNRYKMP 
I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I 
orfl38ng CCERLPDGQGFVLHIRPVQGELNGNKAH DAAVFNRNTEYWIRRFPTQYLFMYNRYKTP 

In addition, ORF138ng is homologous to htrB protein from Pseudomonas fluorescens: 

gnl|PID|e334283 (Y14568) htrB [Pseudomonas fluorescens] Length = 253 
Score =80.8 bits (196), Expect = 9e-15 

Identities = 49/151 (32%), Positives = 79/151 (51%), Gaps = 6/151 (3%) 



+G+ +IK +R G I D P P E G++ FF 



50 Based on this analysis, including the presence of a putative transmembrane domain in the 
gonococcal protein, it was predicted that the proteins from N.meningitidis and N. gonorrhoeae, and 
their epi + opes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF 13 8-1 (57kDa) was cloned in the pGex vectors and expressed in E.coli, as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figure 14A 
5 5 shows the results of affinity purification of the GST-fusion protein. Purified GST- fusion protein 
was used to immunise mice, whose sera were used for ELISA (positive result) and FACS analysis 
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(Figure 14B). These experiments confirm that ORF138-1 is a surface-exposed protein, and that it 
is a useful immunogen. 



Example 69 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 573>: 

1 . . GCGTGGTCGG CCGGCGAATC GTGGCGTGTG TTAATGGAAA GTGAAACGTG 

51 GCATGCGGTG TGGAATACTT TGCGCTTCTC GGCGGCGGCG GTGTATGCGG 

101 CAGCGGTTTT GGGTGTGGTG TATGCGGCGC CGGCGCGGCG GTCGGCGTGG 

151 ATGCGCGGGC TGATGTTTTA GCCGTTTATG GTGTCGCCGG TTTGTGTTTC 

201 GGCGGGCGTG CTGCTGCTTT ATCCGCAGTG GACGGCTTCG TTGCCGTTGC 

251 TGCTGGCGAT GTATGCGCTG CTGGCGTATC CGTTTGTGGC AAAAGATGTT 

301 TTATCAGCCT GGGATGCACT GCCGCCGGAT TACGGCAGGG CGGCGGCGGG 

351 TTTGGGTGCA AACGGCTTTC AGACGGCATG CCGCATCACG TTCCCCCTCT 

4 01 TGAAACCGGC GTTGCGGCGC GGTCTGACTT TGGCGGCGGC AACCTGCGTG 

451 GGCGAATTTG CGGCGACATT GTTTCTGTCG CGTCCGGAAT GGCAGACGCT 

501 GACGACTTTG ATTTATGCCT ATTTGGGACG CGCGGGTGAG GATAATTACG 

551 CGCGGGCGAT GGTGCTG. . 

This corresponds to the amino acid sequence <SEQ ID 574; ORF139>: 



1 . .AWSAGESWRV LMESETWHAV WNTLRFSAAA VYAAAVLGW YAAPARRSAW 

51 MRGLMFXPFM VSPVCVSAGV LLLYPQWTAS LPLLLAMYAL LAYP FVAKDV 

101 LSAWDALPPD YGRAAAGLGA NGFQTACRIT FPLLKPALRR GLTLAAATCV 

151 GEFAATLFLS RPEWQTLTTL IYAYLGRAGE DNYARAMV1. . 

Further work revealed the complete nucleotide sequence <SEQ ID 575>: 



1 ATGGATGGAC GGCGTTGGGT GGTATGGGGT GCTTTTGCCC TGCTGCCTTC 

51 GGCTTTTTTG GCGGTAATGG TCGTTGCGCC TTTGTGGGCG GTGGCGGCGT 

101 ATGACGGTTT GGCGTGGCGC GCGGTGCTGT CGGATGCCTA TATGCTCAAA 

151 CGTTTGGCGT GGACGGTATT TCAGGCAGCG GCAACCTGTG TGCTGGTGCT 

201 GCCTTTGGGC GTGCCTGTCG CGTGGGTGCT GGCGCGGCTG GCGTTTCCGG 

251 GGCGGGCTTT GGTGCTGCGC CTGCTGATGC TGCCTTTTGT GATGCCCACG 

301 TTGGTGGCGG GCGTGGGCGT GCTGGCCCTG TTCGGGGCGG ACGGGCTGTT 

351 GTGGCGCGGC AGGCAGGATA CGCCGTATCT GTTGTTGTAC GGCAATGTGT 

401 TTTTCAACCT TCCTGTGTTG GTCAGGGCGG CGTATCAGGG GTTTGTGCAA 

451 GTGCCTGCGG CACGGCTTCA GACGGCACGG ACGTTGGGCG CGGGGGCGTG 

501 GCGGCGGTTT TGGGACATTG AAATGCCCGT TTTGCGCCCG TGGCTTGCCG 

551 GCGGCGTGTG CCTTGTCTTT CTGTATTGTT TTTCCGGGTT CGGGCTGGCG 

601 CTGCTGCTGG GCGGCAGCCG TTATGCCACG GTCGAAGTGG AAATTTACCA 

651 GTTGGTCATG TTCGAACTCG ATATGGCGGT TGCTTCGGTG CTGGTGTGGC 

701 TGGTGTTGGG GGTAACGGCG GCGGCAGGGT TGCTGTATGC GTGGTTCGGC 

751 AGGCGCGCGG TTTCGGATAA GGCGGTTTCC CCTGTGATGC CGTCGCCGCC 

801 GCAGTCGGTC GGGGAATATG TGCTGCTGGC GTTTGCGGCG GCGGTGTTGT 

851 CTGTGTGCTG CCTGTTTCCT TTGTTGGCAA TTGTTGTGAA AGCGTGGTCG 

901 GCCGGCGAAT CGTGGCGTGT GTTAATGGAA AGTGAAACGT GGCAGGCGGT 

951 GTGGAATACT TTGCGCTTCT CGGCGGCGGC GGTGTATGCG GCGGCGGTTT 

1001 TGGGTGTGGT GTATGCGGCG GCGGCGCGGC GGTCGGCGTG GATGCGCGGG 

1051 CTGATGTTTT TGCCGTTTAT GGTGTCC-CCG GTTTGTGTTT CGGCGGGCGT 

1101 GCTGCTGCTT TATCCGCAGT GGACGGCTTC GTTGCCGTTG CTGCTGGCGA 

1151 TGTATGCGCT GCTGGCGTAT CCGTTTGTGG CAAAAGATGT TTTATCAGCC 

1201 TGGGATGCAC TGCCGCCGGA TTACGGCAGG GCGGCGGCGG GTTTGGGTGC 

1251 AAACGGCTTT CAGACGGCAT GCCGCATCAC GTTCCCCCTC TTGAAACCGG 

1301 CGTTGCGGCG CGGTCTGACT TTGGCGGCGG CAACCTGCGT GGGCGAATTT 

1351 GCGGCGACAT TGTTTCTGTC GCGTCCGGAA TGGCAGACGC TGACGACTTT 

1401 GATTTATGCC TATTTGGGAC GCGCGGGTGA GGATAATTAC GCGCGGGCGA 

1451 TGGTGCTGAC ATTGCTGTTG GCGGCGTTCG CGCTGGGTAT TTTCCTGCTG 

1501 TTGGACGGCG GCGAAGGCGG AAAACAGACG GAAACGTTAT AA 

This corresponds to the amino acid sequence <SEQ ID 576; ORF139-l>: 

1 MDGRRWWWG AFALLPSAFL AVMWAPLWA VAAYDGLAWR AVLSDAYMLK 

51 RLAWTVFQAA ATCVLVLPLG VPVAWV LARL AFPGRALVLR LLML PFVMPT 

101 LVAGVGVLAL FG ADGLLWRG RQCTPYLLLY GNVFFNLPVL VRAAYQGFVQ 

151 VPAARLQTAR TLGAGAWRRF WDIEMPVLRP WLAGG VCLVF LYCFSGFGLA 
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201 LLLGGSRYAT VEVEIYQLVM FELDMAVA SV 1VWLVLGVTA AAGLL YAWFG 

251 RRAVSDKAVS PVMPSPPQ3V GEYVLLAFA A AVLSVCCLFP LLAIW KflffS 

301 AGESWRVLME SETWQAVWNT LRFS AAAVYA AAVLGWYAA A ARRSAWMRG 

351 LMF LPFMVSP VCVSAGVLLL YPQWTAS LPL 1LAMYALLAY PFVA KDVLSA 

401 WDALPPDYGR AAAGLGANGF QTACRITFPL LKPALRRGLT LAAATCVGEF 

451 AATLFLSRPE WQTLTTLIYA YLGRAGEDNY ARAA ?VXTLLL AAFALGIFLL 

501 LDGGEGGKQT ETL* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF139 shows 94.7% identity over a 189aa overlap with an ORF (ORF139a) from strain A ofN. 
meningitidis: 



AWS AGE S WRVLME S ETWHAWNT LRFS AAA 

I : I I I I I I I I I I I 

QSVGEYVLLAF AAAVXSVCCLFXLLAI W KAWSAGESWRVLMESETWQAVWNTXRFSAAA 
270 280 290 300 310 320 



orfl39.pep VYAAAVLGVVYAAP ARRSAWMRGLMF XPFMVSPVCVSAGVLLL Y PQWTAS LPLLLA MYAL 



100 110 120 130 140 150 

LAYPFVA KDVLSAWDALPPDYGPAAAGLGANGFQTACRITFPLLKPALRRGLTLAAATCV 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 

LAYPFVA KDVLSA<DALPPDYGRAAAGLGANGFQTACRITFPLLKPALRRGLTLAAATCV 
390 400 410 420 430 440 



160 



170 



180 



189 



GEFAATLFLSRPEWQTLTTLl YAYLGRAGEDNYARAMVL 



The complete length ORF139a nucleotide sequence <SEQ ID 577> is: 



401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 



ATGGATGGAC 
GGCTTTTTTG 
ATGACGGTTT 
CGTTTGGCGT 
GCCTTTGGGC 
GGCGGGCTTT 
TTGGTGGCGG 
GTGGCGCGGC 
TTTTTNACCT 
GTGCCTGCGG 
GCGGCGGTTT 
GCGGCGTGTG 
TTGCTGCTGG 
GTTGGTCATG 
TGGTGTNGGG 
AGGCGCGCGG 
GCAGTCGGTC 
CTGTGTGCTG 
GCCGGCGAAT 
GTGGAATACT 
TGGGTGTGGT 
CTGATGTTTT 
GCTGCTGCTT 
TGTATGCGCT 
TGNGATGCAC 
AAACGGCTTT 
CGTTGCGGCG 
GCGGCAACCT 



GGCGTTGGGC 
GCGGCAATGG 
GGCGTGGCGC 
GGACGGTATT 
GTGCCTGTCG 
GGTGCTGCGC 
GCGTGGGCGT 
TGGCAGGATA 
TCCTGTGTTG 
CACGGCTTCA 
TGGGACATTG 
CCTTGTCTTC 
GCGGCAGCCG 
TTCGAACTCG 
GGTAACNGCG 
TTTCGGATAA 
GGGGAATATG 
CCTGTTTCNT 
CGTGGCGTGT 
NTGCGCTTCT 
GTATGCGGCG 
TGCCGTTTAT 
NATCCGCAGT 
GCTGGCGTAT 
TGCCGCCGGA 
CAGACGGCAT 
CGGTCTGACT 
TGTTCNTGTC 



GGTATGGGGT 
TCGTTGCGCC 
GCGGTGCTGT 
TCAGGCAGCG 
CGTGGGTGCT 
CTGCTGATGC 
GCTGGCTCTG 
CGCCGTATCT 
GTCAGGGCGG 
GACGGCACNG 
AAATGCCCGT 
CTGTATTGTT 
TTATGCCACG 
ATATGGCGGT 
GC3GCAGGGT 
GGCNGTTTCC 
TGCTNCTGGC 
TTGTTGGCAA 
GTTAATGGAA 
CGGCGGCGGC 
GCGGCGCGGC 
GGTGTCGCCG 
GGACGGCTTC 
CCGTTTGTGG 
TTACGGCAGG 
GCCGCATCAC 
TTGGCGGCGG 
GCGTCNCGAG 



GCTTTTGCCC 
TTTGTGGGCG 
CGGATGCCTA 
GCAACCTGTG 
GGCGCGGCTG 
TGCCTTTTGT 
TTCGGGGCGG 
GTTGTTGTAC 
CATATCAGGG 
ACATTGGGCG 
TTTGCGCCCG 
TTTCGGGGTT 
GTCGAAGTGG 
TGCTTCGGTG 
TGCTGTATGC 
CCTGTGATGC 
GTTTGCGGCG 
TTGTTGTGAA 
AGTGAAACGT 
GGTGTATGCG 
GGTCGGCGTG 
GTTTGTGTTT 
GTTGCCGCTG 
CAAAAGATGT 
GCGGCGGCGG 
GTTCCCCCTC 
CAACCTGCGT 
TGGCAGACGC 



TGCTGCCTTC 
GTGGCGGCGT 
TATGCTCAAA 
TGCTGGTGCT 
GCGTTTCCGG 
GATGCCCACG 
ACGGCCTGTN 
GGCAATGTGT 
GTTTGTGCAA 
CGGGGGCGTG 
TGGCTTGCCG 
CGGGCTGGCA 
AAATTTACCA 
CTNGTGTGGC 
GTGGTTCGGC 
CGTCGCCGCC 
GCGGTGTNGT 
AGCGTGGTCG 
GGCAGGCGGT 
GCGGCGGTTT 
GATGCGCGGG 
CGGCGGGCGT 
CTGCTGGCGA 
TTTATCAGCC 
GTTTGGGTGC 
TTGAAACCGG 
GGGCGAATTT 
TGACGACTTT 
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' XRFS AAAVYA AAVLGVVYAA A ARRSAWMRG 
: XPQWTAS LPL LLAMYALLAY PFVA KDVLSA 
r QTACRITFPL LKPALRRGLT LAAATCVGEF 
l YXGRAGXDNY ARAMVLTLLL AAFALGXFLL 



14 01 GATTTATGCC TATNTGGGAC GCGCGGGTGA NGATAATTAC GCGCGGGCGA 
14 51 TGGTGCTGAC ATTGCTGTTG GCGGCGTTCG CGCTGGGTAT NTTCCTGCTG 
1501 TTGGACGGCG GCGAAGGCGG AAAACGGACG GAAACGTTAT AA 

This encodes a protein having amino acid sequence <SEQ ID 578>: 

1 MDGRRWAVWG AFALLPSAFL AAMVVAPLWA VAAYDGLAWR AVLSDAYMLK 

51 RLAWTVFQAA ATCVLVLPLG VPVAWV LARL AFPGRALVLR LLML PFVMPT 

101 LVAGVGVLAL FGA DGLXWRG V 

151 VPAARLQTAX TLGAGAWRRF t 

201 LLLGGSRYAT VEVEIYQLVM 1 

251 RRAVSDKAVS PVMPSPPQSV C 

301 AGESWRVLME SETWQAVWNT ) 

351 LMF LPFMVSP VCVSAGVLLL > 

4 01 XDALPPDYGR AAAGLGANGF C 

4 51 AATLFXSRXE WQTLTTLIYA } 

501 LDGGEGGKRT ETL* 

ORF139a and ORF139-1 show 96.5% homology over a 514aa overlap: 

or f 13 9a . pep MDGRRWAVWGAFALLPSAFLAAMVVAPLWAVAAYDGLAWRAVLSDAYMLKRLAWTVFQAA 

: I I 1 I I I! : I I I I I I I I II I I I I I 

or f 13 9-1 MDGRRWWWGAFALLPSAFLAVMVVAPLWAVAAYDGLAWRAVLSDAYMLKRLAWTVFQAA 

or f 13 9a . pep ATCVLVLPLGVPVAWVLARLAFPGRALVLRLLMLPFVMPTLVAGVGVLALFGADGLXWRG 

orf 139-1 ATCVLVLPLGVPVAWVLARLAFPGRALVLRLLMLPFVMPTLVAGVGVLALFGADGLLWRG 

or f 13 9a. pep WQDTPYLLLYGNVFFXLPVLVRAAYQGFVQVPAARLQTAXTLGAGAWRRFWDIEMPVLRP 

orf 139-1 RQDTPYLLLYGNVFFNLPVLVRAAYQGFVQVPAARLQTARTLGAGAWRRFWDIEMPVLRP 

orf 139a. pep WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAVASVLVWLVXGVTA 
I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 
orf 139-1 WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAVASVLVWLVLGVTA 

orf 139a . pep AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFAAAVXSVCCLFXLLAIVVKAWS 

orf 13 9-1 AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFAAAVLSVCCLFPLLAIWKAWS 

orf 139a. pep ageswrvlmesetwqavwntxrfsaaavyaaavlgvvyaaaarrsawmrglmft.pftwsp 

II 1 1 11 1 1 1 II II I II I i I II 1 1 1 1 1 1 1 

orf 139-1 AGESWRVLMESETWQAVWNTLRFSAAAVYAAAVLGWYAAAARRSAWMRGLMFLPFMVSP 

orf 13 9a. pep VCVSAGVLLLXPQWTASLPLLLAMYALLAYPFVAKDVLSAXDALPPDYGRAAAGLGANGF 

I I I I I I I I I I I I 

orf 13 9-1 VCVSAGVLLL YPQWTASLPLLLAMYALLAYPFVAKDVLSAWDALPPDYGRAAAGLGANGF 

orf 13 9a . pep QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFXSRXEWQTLTTLIYAYXGRAGXDNY 

I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I III 

orf 139-1 QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNY 

or f 1 3 9 a . pep ARAMVLTLLLAAFALGXFLLLDGGEGGKRTETLX 

I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I 
orf 13 9-1 ARAMVLTLLLAAFALGI FLLLDGGEGGKQTETLX 

Homology with a predicted ORF from N. gonorrhoeae 

ORF139 shows 95.2% identity over a 189aa overlap with a predicted ORF (ORF139ng) from 
N. gonorrhoeae: 

orfl39.pep AWSAGESWRVLMESETWHAVWNTLRFSAAA 30 

orfl39ng QSVGEYVLLAFSVAVLSVCCLFPLSAIWKAWSAGESRRVLMESETWQAVWNTLRFSAAA 327 

orf 13 9. pep VYAAAVLGWYAAPARRSAWMRGLMFXPFMVSPVCVSAGVLLLYPQWTASLPLLLAMYAL 90 

I : I I I I I I I I I I I HI MINIM MINIUM II ||| 

orfl39ng VFAAAVLGWYAAAARRLVWMRGLVFLPFMVSPVCVSAGVLLLYPGWTASLPLLLAMYAL 387 
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orfl39.pep LAYPFVAKDVLSAWDALPPDYGRAAAGLGANGFQTACRITFPLLKPALRRGLTLAAATCV 150 

orfl39ng LAYPFVAKDVLSAWDALPPDYGRAAAGLGANGFQTACRITFPLLKPALRRGLTLAAATCV 447 

orfl39.pep GEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNYARAMVL 189 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl39ng GEFAATLFLSRPEWQTLTTLIYAYLGRAGE DNYARAMVLTLLLSAFAVCIFLLLDNGEGG 507 

The complete length ORF139ng nucleotide sequence <SEQ ID 579> is predicted to encode 
protein having amino acid sequence <SEQ ED 580>: 

1 MDGRCWAVRG AFSLLPSAFL AVMWAPLWA VAAYDGLAWR AVLSDAYMLK 

51 RLAWTVFQAA ATCVLVLPLG VPVAWVLARL AFPGRALVLR LLMLPFVMPT 

101 LVAGVGVLAL FGADGLLWRG RQDTPYLLLY GNVFFNLPVL VRAAYQGFAQ 

151 VPAARLQTAR TLGAGAWRPF WDIEMP VLR? WLAGGVCLVF LYCFSGFGLA 

201 LLLGGSRYAT VEVEI YQLVM FE LDMAGAS A LVWLVLGVTA AAGLLYAWFG 

2 51 RRAVS DKAVS PVMPSPPQSV GEYVLLAFSV AVLSVCCLFP LSAIWKAWS 

301 AGESRRVLME SETWQAVWNT LRFSAAAVFA AAVLGVVYAA AARRLVWMRG 

351 LVFLPFMVSP VCVSAGVLLL YPGWTASLPL LLAMYALLAY PFVAKDVLSA 

4 01 WDALPPDYGR AAAGLGANGF QTACRITFPL LKPALRRGLT LAAATCVGEF 

451 AATLFLSRPE WQTLTTLIYA YLGRAGEDNY ARAMVLTLLL SAFAVCIFLL 

501 LDNGEGGKRT ETL* 

Further work revealed a variant gonococcal DNA sequence <SEQ ID 581>: 

1 ATGGATGGAC GGTGTTGGGC GGTACGGGGT GCTTTTTCCC TGCTGCCTTC 

51 GGCTTTTTTG GCGGTAATGG TCGTTGCGCC TTTGTGGGCG GTGGCGGCGT 

101 ATGACGGTTT GGCGTGGCGC GCGGTGCTGT CGGATGCCTA TATGCTCAAA 

151 CGTTTGGCGT GGACGGTGTT TCAGGCGGCG GCAACCTGTG TGCTGGTGCT 

201 GCCTTTGGGC GTGCCTGTCG CGTGGGTGCT GGCGCGGCTG GCGTTCCCGG 

251 GGCGGGCTTT GGTGCTGCGC CTGCTGATGC TGCCGTTTGT GATGCCCACG 

301 CTGGTGGCGG GCGTGGGCGT GCTGGCTCTG TTCGGGGCGG ACGGGCTGTT 

351 GTGGCGCGGC CGGCAGGATA CGCCGTATCT GTTGTTGTAC GGCAATGTGT 

401 TTTTCAACCT GCCCGTGTTG GTCAGGGCGG CGTATCAGGG GTTTGCTCAA 

451 GTGCCTGCGG CACGGCTTCA GACGGCACGG ACGTTGGGCG CGGGGGCGTG 

501 GCGGCGGTTT TGGGACATTG AAATGCCCGT TTTGCGCCCG TGGCTTGCCG 

551 GCGGCGTGTG CCTTGTCTTC CTGTATTGTT TTTCGGGGTT CGGGCTGGCA 

601 TTGCTGTTGG GCGGCAGCCG TTATGCCACG GTCGAAGTGG AAATTTACCA 

651 GTTGGTTATG TTCGAACTCG ATATGGCGGG GGCTTCGGCG CTGGTGTGGC 

7 01 TGGTGTTGGG GGTAACGGCG GCGGCAGGGT TGCTGTATGC GTGGTTCGGC 

7 51 AGGCGCGCGG TTTCGGATAA GGCGGTTTCC CCCGTGATGC CGTCGCCGCC 

801 GCAATCGGTG GGGGAATATG TATTGCTGGC ATTTTCGGTG GCGGTGTTGT 

851 CCGTGTGCTG CCTGTTTCCT TTGTCGGCAA TTGTTGTGAA AGCGTGGTCG 

901 GCCGGCGAAT CGCGGCGTGT GTTAATGGAA AGTGAAACGT GGCAGGCAGT 

951 GTGGAATACt ttGCGCTTTT CGGCGGCGGC GGTGTTTGCG GCGGCGGTTT 

1001 TGGGTGTGGT GTATGCGGCG GCGGCGCGGC GGCTGGTGTG GATGCGCGGA 

1051 CTGGTGTTTT TACCGTTTAT GGTGTCGCCG GTTTGTGTTT CGGCGGGCGT 

1101 GCTGCTGCTT TATCCGGGGT GGACGGCTTC GTTACCGCTG CTGCTGGCGA 

1151 TGTATGCGCT GCTGGCGTAT CCGTTTGTGG CAAAAGATGT TTTATCGGCC 

1201 TGGGATGCAC TGCCGCCGGA TTACGGCAGG GCGGCGGCAG GTTTGGGCGC 

1251 AAACGGCTTT CAGACGGCAT GCCGTATCAC GTTCCCCCTC TTGAAACCGG 

1301 CGTTGCGGCG CGGTCTGACT TTGGCGGCGG CGACGTGTGT GGGCGAATTT 

13 51 GCGGCAACCT TGTTCCTGTC GCGTCCGGAA TGGCAGACGT TGACGACTTT 

14 01 GATTTATGCC TATTTGGGGC GTGCGGGTGA GGACAATTAT GCGCGGGCAA 
1451 TGGTGTTGAC ATTGCTGTTG TCGGCATTTG CGGTGTGCAT TTTCCTGCTG 
1501 TTGGACAACG GCGAAGGCGg aaaACGGACG GAAACGTTAT AA 

This corresponds to the amino acid sequence <SEQ ID 582; ORF139ng-l>: 

1 MDGRCWAVRG AFSLLPSAFL AVMWAPLWA VAAYDGLAWR AVLSDAYMLK 

51 RLAWTVFQAA ATCVLVLPLG VPVAWVL ARL AFPGRALVLR LLMLP FVMPT 

101 LVAGVGVLAL FG ADGLLWRG RQDTPYLLLY GNVFFNLPVL VRAAYQGFAQ 

151 VPAARLQTAR TLGAGAWR.RF WDIEMPVLRP WLAGG VCLVF LYCFSGFGLA 

201 LLLGGSRYAT VEVEI YQLVM FELDMAGA SA LVWLVLGVTA AAGLL YAWFG 

251 RRAVS DKAVS PVMPSPPQSV GEYVLLAFS V AVLSVCCLFP LSAIW KAWS 

301 AGESRRVLME SETWQAVWNT LRFS AAAVFA AAVLGWYAA AA RRLVWMRG 

351 LVF LPFMVSP VCVSAGVLLL YPGWTASL PL LLAMYALLAY PFVA KDVLSA 

401 WDALPPDYGR AAAGLGANGF QTACRITFPL LKPALRRGLT LAAATCVGEF 

451 AATLFLSRPE WQTLTTLIYA YLGRAGEDNY ARA MVLTLLL SAFAVCIFLL 

501 LDNGEGGKRT ETL* 
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ORF139ng-l and ORF139-1 show 95.9% identity over 513aa overlap: 

orfl39ng MDGRCWAVRGAFSLLPSAFLAVMWAPLWAVAAYDGLAWRAVLSDAYMLKRLAWTVFQAA 
I I I I 1:1 I I I : I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 139-1 MDGRRWWWGAFALLPSAFLAVMWAPLWAVAAYDGLAWRAVLSDAYMLKRLAWTVFQAA 

orf 139ng ATCVLVLPLGVPVAWVLARLAFPGRALVLRLLMLPFVMPTLVAGVGVLALFGADGLLWRG 

I I I I I I I I I I I Ill Ill 

orf 139-1 ATCVLVLPLGVPVAWVLARLAFPGRALVLRLLMLPFVMPTLVAGVGVLALFGADGLLWRG 

orfl39ng RQDTPYLLLYGNVFFNLPVLVRAAYQGFAQVPAARLQTARTLGAGAWRRFWDIEMPVLRP 

orf 139-1 RQDTPYLLLYGNVFFNLPVLVRAAYQGFVQVPAARLQTARTLGAGAWRRFWDIEMPVLRP 

orfl39ng WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAGASALVWLVLGVTA 

orf 139-1 WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAVASVLVWLVLGVTA 

orf 139ng AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFSVAVLSVCCLFPLSAIWKAWS 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I 

orfl39-l AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFAAAVLSVCCLFPLLAIWKAWS 

orfl39ng AGESRRVLMESETWQAVWNTLRFSAAAVFAAAVLGWYAAAARRLVWMRGLVFLPFMVSP 

orf 139 AGESWRVLMESETWQAVWNTLRFSAAAVYAAAVLGWYAAAARRSAWMRGLMFLPFMVSP 

orfl39ng VCV5AGVLLLYPGWTASLPLLLAMYALLAYPFVAKDVLSAWDALPPDYGRAAAGLGANGF 

orf 139-1 VCVSAGVLLLYPQWTASLPLLLAMYALLAYPP/AKDVLSAWDALPPDYGRAAAGLGANGF 

orfl39ng QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNY 

I IN I I I I I I I I I I I I I I I I I I I I II I I I I I 

orf 139-1 QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNY 

orfl39ng ARAMVLTLLLSAFAVCIFLLLDNGEGGKRTETL 

orf 1 3 9- 1 ARAMVLTLLLAAFALGI FLLLDGGEGGKQTETL 

Based on the presence of a predicted binding-protein-dependent transport systems inner membrane 
component signature (underlined) in the gonococcal protein, it is predicted that the proteins from 
N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 



50 



Example 70 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 583>: 

1 ATGGACGGCT GGACACAGAC GCTGTCCGCG CAAACCCTGT TGGGCATTTC 

51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAGA TTCCGCATCC 

101 ACGCGCTGCT GACACTGGTC ATCGTCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT TGTCAAAGAC ATACTGGTCA AAAACTTCGG 

201 CGGCACGCTC GGCGGCGTGG CGCTTCTGGT CGGCCTGGGC GCGATGCTCG 

251 AACGTTTGGT C. . . 

This corresponds to the amino acid sequence <SEQ ID 584; ORF140>: 



Further work revealed the complete nucleotide sequence <SEQ ID 585>: 



1 ATGGACGGCT GGACACAGAC GCTGTCCGCG CAAACCCTGT TGGGCATTTC 

51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAAA TTCCGCATCC 

55 101 ACGCGCTGCT GACACTGGTC ATCGTCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT TGTCAACGAC ATACTGGTCA AAAACTTCGG 



WO 99/24578 



-334- 



PCT/IB98/01665 



1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 



CGGCACGCTC 
GACGTTTGGT 
ATCCGGATGT 
GCTGATTTTC 
TGCCCATCGT 
TTCGCGCTTG 
GCCCCATCCG 
GCCAAGTTTT 
AGCGGCTATA 
TCCCGAACTG 
CTGCCAAAGC 
ATTTTCCTGA 
TGCGGACGAA 
TCGCCCTTCT 
CGCGGCGAAA 
CCCCGTCTGT 
GCGTTTTGCG 
GATTTGGGCA 
GCGTATCGCG 
TGATGGCTCC 
TGTATCGTAT 
CGACTCCGGC 
CCACGCTGAA 
TTTGCCTTGT 



GGCGGCGTGG 
CGAAACATCC 
TCGGCGAAAA 
GGCTTCCCGA 
GTTCGCCACC 
CCTCCATCGG 
GGCCCGATTG 
GATTTTGGGT 
TGCTCGGCAA 
CTCAGCGGCG 
AGGAACGGTC 
ATACCGGCGT 
ACCTGGGTTC 
GATTTCCGTA 
GCGGCAGCGC 
TCCGTGATTC 
CGCTTCCGGC 
TTCCCGTCCT 
CAAGGTTCGG 
TGCCGTTGCC 
TGGCAACGGC 
TTCTGGCTGG 
AACCTGGACG 
CCGCACTGCT 



CGCTTCTGGT 
GGCGGCGCAC 
ACGCGCACCG 
TTTTCTTCGA 
GCACGGCGCA 
CGCATTTTCC 
CCGCTTCCGA 
CTGCCGACCG 
AGTGTTGGGG 
GCACGCAAGA 
GTCGCCATCA 
ATCGGCCCTC 
AGACGGCAAA 
TTGGTCGCAC 
GTTGGAAAAA 
TGATTACCGG 
ATCGGCAAGG 
TTTGGGCTGT 
CAACCGTCGC 
GCCGCCGGCT 
GGCAGGTTCG 
TCGGCCGTCT 
GTCAACCAAA 
GTTCGCCATC 



CGGCCTGGGC 
AGTCGCTGGC 
TTCGCGCTGG 
TGCCGGACTA 
TGAAACAGGA 
GTCATGCACG 
ATTTTACGGC 
CCTTCATCAC 
CGCACCATCC 
CAACGACCTG 
TGCTGATTCC 
ATCAGCGAAA 
AATAATCGGT 
TGTTTGTCTT 
ACCGTGGACG 
CGCGGGCGGT 
CACTCGCCGA 
TTCCTTGTCG 
CCTGACCACC 
TTACCGACTG 
GTCGGTTGCA 
CTTGGACATG 
CCCTCATCGC 
GTCTGA 



GCGATGCTCG 
GGACGCGCTG 
GCGTTGCCTC 
ATCGTCATGC 
CGTACTGCCC 
TCTTCCTGCC 
GCGAACATCG 
ATGGTATTTC 
ATGTTCCCGT 
CCGAAAGAAC 
CATGCTGCTG 
AACTCGTAAG 
TCGACACCGA 
GGGACGCAAA 
GCGCACTCGC 
ATGTTCGGCG 
CAGCATGGCG 
CCTTGGCACT 
GCCGCCGCGC 
GCAGCTCGCC 
GCCACTTCAA 
GACGTACCGA 
ACTCATCGGC 



This corresponds to the amino acid sequence <SEQ ID 586; ORF140-1>: 



1 MDGWTQTLSA QTLLGISAAA IILILILIVK FRIHALLTLV IVSLLTALAT 

51 GLPTGSIVND ILVKNFGGTL C-GVALLVGLG AMLGRLV ETS GGAQSLADAL 

101 IRMFGEKRAP FALGVAS LIF GFPIFFDAGL IVML PIVFAT ARRMKQDVLP 

151 FALASIGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

201 SGYMLGKVLG RTIHVPVPEL LSGGTQDNDL PKEPAK AGTV VAIMLIPMLL 

251 IFLNTGVSAL ISEKLVSADE TWVQTAKIIG S TPIALLISV LVALFVLG RK 

301 RGESGSALEK TVDGALAPVC SVILITGAGG MFGGVL RASG IGKALADSMA 

351 DLG IPVLLGC FLVALALRIA QGSAT VALTT AAALMA PAVA AA GFTDWQLA 

401 CIVLATAAGS VGCSHFNDSG FWLVGRLLDM DVPTTLKTWT VNQT LIALIG 

451 FALSALLFAI V * 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted QRF from N. meningitidis (strain A) 

ORF140 shows 95.4% identity over a 87aa overlap with an ORF (ORF140a) from strain A of//. 
meningiddis: 

10 20 30 40 50 60 

orfl40.pep MDGWTQTLSAQTLLGISAAAIILILILIVRFRIHALLTLVIVSLLTALATG LPTGSIVKD 

Orfl40a MDGWTQTLSAQTLLGISAAAI ILILILIVKFRIHALLTLVIVSLLTALATG LPTGSIVND 

10 20 30 40 50 60 



or f 14 0. pep ILVKNFGGTL GGVALLVGLGAMLERIV 



The complete length ORF140a nucleotide sequence <SEQ ID 587> is: 



1 ATGGACGGCT GGACACAGAC GCTGTCCGCG CAAACCCTGT TGGGCATTTC 

51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAAA TTCCGCATCC 

101 ACGCGCTGCT GACACTGGTC ATCGTCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT TGTCAACGAC GTACTGGTCA AAAACTTCGG 

201 CGGCACGCTC GGCGGCGTGG CGCTTCTGGT CGGCCTGGGC GCGATGCTCG 

251 GACGTTTGGT CGAAACATCC GGCGGCGCAC AGTCGCTGGC GGACGCGCTG 

301 ATCCGGATGT TCGGCGAAAA ACGCGCACCG TTCGCGCTGG GCGTTGCCTC 

351 GCTGATTTTC GGCTTCCCGA TTTTCTTCGA TGCCGGACTA ATCGTCATGC 

401 TGCCCATCGT GTTCGCCACC GCACGGCGCA TGAAACAGGA CGTACTGCCC 

451 TTCGCGCTTG CCTCCATCGG CGCATTTTCC GTCATGCACG TCTTCCTGCC 
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1051 
1101 
1151 
1201 
1251 
1301 
1351 



GCCCCATCCG 
GCCAAGTTTT 
AGCGGCTATA 
TCCCGAACTG 
CTGCCAAAGC 
ATTTTCCTGA 
TGCGGACGAA 
TCGCCCTTCT 
CGCGGCGAAA 
CCCCGTCTGT 
GCGTTTTGCG 
GATTTGGGCA 
GCGTATCGCG 
TGATGGCTCC 
TGTATCGTAT 
CGACTCCGGC 
CCACGCTGAA 
TTTGCCTTGT 



GGCCCGATTG 
GATTTTGGGT 
TGCTCGGCAA 
CTCAGCGGCG 
AGGAACGGTC 
ATACCGGCGT 
ACCTGGGTTC 
GATTTCCGTA 
GCGGCAGCGC 
TCCGTGATTC 
CGCTTCCGGC 
TTCCCGTCCT 
CAAGGTTCGG 
TGCCGTTGCC 
TGGCAACGGC 
TTCTGGCTGG 
AACCTGGACG 
CCGCACTGCT 



CCGCTTCCGA 
CTGCCGACCG 
AGTGTTGGGG 
GCACGCAAGA 
GTCGCCATCA 
ATCGGCCCTC 
AGACGGCAAA 
TTGGTCGCAC 
GTTGGAAAAA 
TGATTACCGG 
ATCGGCAAGG 
TTTGGGCTGT 
CAACCGTCGC 
GCCGCCGGCT 
GGCAGGTTCG 
TCGGCCGCCT 
GTCAACCAAA 
GTTCGCCATC 



ATTTTACGGC 
CCTTCATCAC 
CGCACCATCC 
CAACGACCTG 
TGCTGATTCC 
ATCAGCGAAA 
AATAATCGGT 
TGTTTGTCTT 
ACCGTGGACG 
CGCGGGCGGT 
CACTCGCCGA 
TTCCTTGTCG 
CCTGACCACC 
TTACCGACTG 
GTCGGTTGCA 
CTTGGACATG 
CCCTCATCGC 
GTCTGA 



GCGAACATCG 
ATGGTATTTC 
ATGTTCCCGT 
CCGAAAGAAC 
CATGCTGCTG 
AACTCGTAAG 
TCGACACCGA 
GGGACGCAAA 
GCGCACTCGC 
ATGTTCGGCG 
CAGCATGGCG 
CCTTGGCACT 
GCCGCCGCGC 
GCAGCTCGCC 
GCCACTTCAA 
GACGTACCGA 
ACTCATCGGC 



This encodes a protein having amino acid sequence <SEQ ID 588>: 

1 MDGWTQTLSA QTLLGISAAA IILILILIVK FRIHALLTLV IVSLLTALAT 

51 GLPTGSIVND VLVKNFGGTL GGVA1LVGLG AMLGRLV ETS GGAQSLADAL 

101 IRMFGEKRAP FALGVAS LIF GFPIFFDAGL IVML PIVFAT ARRMKQDVLP 

151 FALASIGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

201 SGYMLGKVLG RTIHVPVPEL LSGGTQDNDL PKEPAK AGTV VAIMLIPMLL 

251 I FLNTGVSAL ISEKLVSADE TWVQTAKIIG S TPIALLISV LVALFVLG RK 

301 RGESGSALEK TVDGALAPVC SVILITGAGG MFGGVL RASG IGKALADSMA 

351 DLG IPVLLGC FLVALALRIA QGSAT VALTT AAALMAPAVZ AAGFTDWQLA 

401 CIVLATAAGS VGCSHFNDSG FWLVGRLLDM DVPTTLKTWT VNQT LIALIG 

451 FALSALLFAI V * 

ORF140a and ORF140-1 show 99.8% identity over a 461aa overlap: 

orf 140-1 .pep MDGWTQTLSAQTLLGISAAAIILILILIVKFRIHALLTLVIVSLLTALATGLPTGSIVND 60 

I ! I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 140a MDGWTQTLSAQTLLGISAAAIILILILIVKFRIHALLTLVIVSLLTALATGLPTGSIVND 60 

orf 140-1 .pep ILVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFALGVASLIF 120 

: I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I t I I I 
orf 140a VLVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFALGVASLIF 120 

orf 140-1 .pep GFPIFFDAGLIVMLPIVFATARRMKQDVLPFALASIGAFSVMHVFLPPHPGPIAASEFYG 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I 
orf 140a GFPIFFDAGL I VMLPIVFATARRNKQDVLPFALASIGAFSVMHVFLPPHPGPIAASEFYG 810 

orf 140-1 .pep ANIGQVLILGLPTAFITWYFSGYMLGKVLGRTIHVPVPELLSGGTQDNDLPKEPAKAGTV 240 

I I I I I I I I ! I I I I I ! I ! I I II 

orf 140a ANIGQVLILGLPTAFITWYFSGYKLGKVLGRTIHVPVPELLSGGTQDNDLPKEPAKAGTV 24 0 

orf 140-1 .pep VAIMLIPMLL I FLNTGVSALISEKLVSADETWVQTAKIIGSTPIALLISVLVALFVLGRK 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 140a VAIMLIPMLL I FLNTGVSALISEKLVSADETWVQTAKI IGSTPIALLISVLVALFVLGRK 300 

orf 14 0-1. pep RGESGSALEKTVDGALAPVCSVILITGAGGMFGGVLRASGIGKALADSMADLGIPVLLGC 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I 

orf 140a RGESGSALEKTVDGALAPVCSVILITGAGGMFGGVLRASGIGKALADSMADLGIPVLLGC 360 

orf 140-1 .pep FLVALALRIAQGSATVALTTAAALMAPAVAAAGFTDWQLACIVLATAAGSVGCSHFNDSG 420 

I I I I I I I I I I I I > I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I | | | | | 

o r f 1 4 0 a FLVALALRI AQGSATVALTTAAALMAPAVAAAG FT DWQLAC I VLATAAG SVGCSHFNDSG 420 

orf 140-1. pep FWLVGRLLDMDVPTTLKTWTVNQTLIALIGFALSALLFAIV 461 

II II I I I I I I I I I I I I I I I I I I 

orfl40a FWLVGRLLDMDVPTTLKTWTVNQTLIALIGFALSALLFAIV 461 

Homology with a predicted ORF from N.2onorrhoeae 

ORF 140 shows 92% identity over a 87aa overlap with a predicted ORF (ORF140ng) from 
N. gonorrhoeae: 
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orfl40.pep MDGWTQTLSAQTLLGISAAAIILILILIVRFRIHALLTLVIV3LLTALATGLPTGSIVKD 60 

III I I I I I I I I I I I I I I I I I I I I I I I I I : I I I : I I I I I I I : II I I I I I I II I I I I I I : I 
orfl4 0ng MDGRTQTLSAQTLLGISAAAIILILILIVKFRIRALLTLVIASLLTALATGLPTGSIVND 60 

5 orfl40.pep ILVKNFGGTLGGVALLVGLGAMLERLV 87 

o r f 1 4 Ong VLVKNFGGT LGGVALLVGLGAMLGRLVETSGGAQS LADALI RMFGEKRAPFAPGVAS LI F 120 

The complete length ORF140ng nucleotide sequence <SEQ ID 589> was predicted to encode a 
protein having amino acid sequence <SEQ ID 590>: 

10 1 MDGRTOTLSA OTLLGISAAA IILILILIVK FRIRALLTLV IASLLTALAT 

51 GLPTGSIVND VLVKNFGGT L GGVALLVGLG AMLGRLV ETS GGAQSLADAL 

101 IRMFGEKRAP FAPGVAS LIF GFPIFFDAGL IVML PIVFAT ARRMKQDVLP 

151 FALASVGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

201 SGYMLGKVLG RAIHVPVPEL LSGGTQDSDP PKEPAK AGTV VAVMLIPMLL 

15 251 IFLNTGVSAL ISEKLVSADE TWVQTAXMIG S TPVALLISV LAALLVLG RK 

301 RGESGSTLEK TVDGALAPA C SVILITGAGG MFGGVL RASG IGKALADSMA 

351 DLG IPVLLGC FLVALALRIA QGSAT VALTT AAALMAPAVA AA GFTDWQLA 

401 CIVLATAAGS VGCSHFNDSG FWLVGRLSDM DVPTTLKTWT VNQT LIAFIG 

451 FALSALLFAI V * 

20 Further work revealed a variant gonococcal DNA sequence <SEQ ID 59 1>: 

1 ATGGACGGCC GGACACAGAC GCTGTCCGCG CAAACCTTGT TGGGCATTTC 

51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAAA TTCCGCATCC 

101 GCGCGCTGCT GACACTGGTC ATCGCCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT CGTCAACGAC GTACTGGTCA AAAACTTCGG 

25 201 CGGCACGCTC GGCGGCGTGG CGCTTCTGGT CGGTCTGGGC GCAATGCTCG 

251 GACGTTTGGT AGAAACATCC GGCGGCGCAC AGTCGCTGGC GGACGCGCTG 

301 ATCCGGATGT TCGGCGAAAA ACGCGCACCG TTCGCTCCGG GCGTTGCCTC 

351 GCTGATTTTC GGCTTCCCGA TTTTCTTCGA TGCCGGACTA ATCGTCATGC 

4 01 TGCCCATCGT ATTCGCCACC GCACGGCGCA TGAAACAGGA CGTACTGCCC 

30 451 TTCGCGCTTG . CCTCCGTCGG CGCATTTTCC GTCATGCACG TCTTCCTGCC 

501 GCCCCATCCG GGCCCGATTG CCGCTTCCGA ATTTTACGGC GCGAACATCG 

551 GCCAGGTTTT GATTTTGGGT CTGCCGACCG CCTTCATCAC ATGGTATTTC 

601 AGCGGCTATA TGCTCGGCAA AGTGTTGGGG CGCGCCATCC ATGTTCCCGT 

651 TCCCGAACTG CTCAGCGGCG GCACGCAAGA CAGCGACCCG CCGAAAGAAC 

35 701 CTGCCAAAGC AGGAACGGTC GTCGCCGTCA TGCTGATTCC CATGCTGCTG 

7 51 ATTTTCCTGA ATACCGGCGT ATCAGCCCTC ATCAGCGAAA AACTCGTAAG 

801 TGCGGACGAA ACTTGGGTTC AGACGGCAAA AATGATCGGT TCGACACCTG 

851 TCGCCCTTCT GATTTCCGTA TTGGCCGCAC TGTTGGTCTT GGGACGCAAA 

901 CGCGGCGAAA GCGGCAGCAC GTTGGAAAAA ACCGTGGACG GCGCACTCGC 

40 951 CCCCGCCTGT TCCGTGATTC TGATTACCGG CGCGGGCGGT ATGTTCGGCG 

1001 GCGTTTTGCG CGCTTCCGGC ATCGGCAAGG CACTCGCCGA CAGCATGGCG 

1051 GATTTGGGCA TTCCCGTCCT TTTGGGCTGC TTCCTTGTCG CCTTGGCACT 

1101 GCGTATCGCG CAAGGTTCGG CAACCGTCGC CCTGACCACA GCCGCCGCGC 

1151 TGATGGCTCC TGCCGTTGCC GCCGCCGGCT TTACCGACTG GCAGCTCGCC 

45 1201 TGTATCGTAT TGGCAACGGC GGCAGGTTCG GTCGGTTGCA GCCACTTCAA 

1251 CGACTCCGGC TTCTGGCTGG TCGGCCGCCT CTTGGATATG GACGTACCGA 

1301 CCACGCTGAA AACCTGGACG GTCAACCAAA CCCTCATCGC ATTCATCGGC 

1351 TTTGCCTTGT CCGCACTGCT GTTTGCCATC GTCTGA 

This corresponds to the amino acid sequence <SEQ ID 592; ORF140ng-l>: 

50 1 MDGRTQTLSA QTLLGISAAA IILILILIVK FRIRALLTLV IASLLTALAT 

51 GLPTGSIVND VLVKNFGGT L GGVALLVGLG AMLGRLV ETS GGAQSLADAL 

101 IRMFGEKRAP FAPGVAS LIF GFPIFFDAGL IVML PIVFAT ARRMKQDVLP 

151 FALASVGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

201 SGYMLGKVLG RAIHVPVPEL LSGGTQDSDP PKEPAK AGTV VAVMLIPMLL 

55 251 IFLNTGVSAL ISEKLVSADE TWVQTAKMIG S TPVALLISV LAALLVLG RK 

301 RGESGSTLEK TVDGALAPAC SVILITGAGG MFGGVL RASG IGKALADSMA 

351 DLG IPVLLGC FLVALALRIA QGSAT VALTT AAALMAPAVA AA GFTDWQLA 

401 CIVLATAAGS VGCSHFNDSG FWLVGRLIDM DVPTTLKTWT VNQT LIAFIG 

451 FALSALLFAI V * 

60 ORF140ng-l and ORF140-1 show 96.3% identity over 461aa overlap: 

orfl40ng-l.pep MDGRTQTLSAQTLLGISAAAIILILILIVKFRIRALLTLVIASLLTALATGLPTGSIVND 
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orf 140-1 MDGWTQTLSAQTLLGI SAAAI ILILILIVKFRIHALLTLVIVSLLTALATGLPTGS I VND 

orf 140ng-l . pep VLVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFAPGVASLIF 
orf 14 0-1 ILVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFALGVASLIF 



orf 140ng-l . pep GFPIFFDAGLIVMLPIYFATARRMKQDVLFFALASVGAFSVMHVFLPPHPGPIAASEFYG 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I 
orf 14 0-1 GFPIFFDAGLIVMLPIVFATARRMKQDVLPFALASIGAFSVMHVFLPPHPGPIAASEFYG 

orf 140ng-l .pep ANIGQVLILGLPTAFITWYFSGYMLGKVLGRAIHVPVPELLSGGTQDSDPPKEPAKAGTV 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I : I I ! I I I ! I I I I 
orf 140-1 ANIGQVLILGLPTAFITWYFSGYMLGKVLGRTIHVPVPELLSGGTQDNDLPKEPAKAGTV 

orf 140ng-l .pep VATOLIPMLLIFLNTGVSALISEKLVSADETWVQTAKMIGSTPVALLISVLAALLVLGRK 

orf 140-1 VAIMLIPMLLIFLNTGVSALISEKLVSADETWVQTAKIIGSTPIALLISVLVALFVLGRK 

orfl40ng-l.pep RGESGSTLEKTVDGALAPACSVILITGAGGMFGGVLRASGIGKALADSMADLGIPVLLGC 

orf 140-1 RGESGSALEKTVDGALAPVCSVILITGAGGMFGGVLRASGIGKALADSMADLGIPVLLGC 



orf 140ng-l .pep FLVALALRIAQGSATVALTTAAALMAPAVAAAGFTDWQLACIVLATAAGSVGCSHFNDSG 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 140-1 FLVALALRIAQGSATVALTTAAALMAPAVAAAGFTDWQLACIVLATAAGSVGCSHFNDSG 

orf 140ng-l .pep FWLVGRLLDMDVPTTLKTWTVNQTLIAFIGFALSALLFAIV 

orf 14 0-1 FWLVGRLLDMDVPTTLKTWTVNQTLIALIGFALSALLFAIV 

Furthermore, ORF140ng-l is homologous to an E.coli protein: 



gi 1 882633 (U29579) ORF_o454 [Escherichia coli] >gi 11789097 (AE000358) o454; 
This 454 aa ORF is 34% identical (9 gaps) to 444 residues of an approx. 456 ; 
protein GNTP_BACLI SW: P46832 [Escherichia coli] Length = 454 
Score = 210 bits (529), Expect = le-53 

Identities = 130/384 (33%), Positives = 194/384 (49%), Gaps = 19/384 (4%) 

Query: 88 ETSGGAQSLADALIRMFGEKRAPFAPGVASLIFGFPIFFDAGLIVMLPIVFATARRMKQD 147 

E SGGA+SLA+ R G+KR A +A+ G P+FFD G I + + PI + + A+ K 
Sbjct: 80 EHSGGAESLANYFSRKLGDKRTIAALTLAAFFLGIPVFFDVGFIILAPIIYGFAKVAKIS 139 

Query: 148 VLPFALASVGAFSVMHVFLPPHPGPIAASEFYGANIGQVLILGLPTAFITWYFSGYMLGK 207 

L F L G +HV +PPHPGP+AA+ A+IG + I+G+ +1 GY K 

Sbjct: 140 PLKFGLPVAGIMLTVHVAVPPHPGPVAAAGLLHADIGWLTIIGIAIS-IPVGWGYFAAK 198 

Query: 208 VLGRAIHVPVPELL SGGTQDS DPPKE PAKAGT VVAVMLI PMLLIFLNTGV 257 

++ + + E+L G T+ SD P A V ++++IP+ +1 T 

Sbjct: 199 IINKRQYAMSVEVLEQMQLAPASEEGATKLSDKINPPGVA-LVTSLIVIPIAIIMAGT— 255 

Query: 258 SALISEKLVSADETWVQTAKMIGSTPXXXXXXXXXXXXXXGRKRGESGSTLEKTVDGALA 317 

+S L+ + T ++IGS +RG S + AL 

Sbjct: 256 — VSATLMPPSHPLLGTLQLIGSPMVALMIALVLAFWLLALRRGWSLQHTSDIMGSALP 312 

Query: 318 PACSVILITGAGGMFGGVLRASGIGKALADSMADLGIPVLLGCFLVALALRIAQGSXXXX 377 

A VIL+TGAGG+FG VL SG+GKALA+ + + +P+L F+++LALR +QGS 
Sbjct: 313 TAAWILVTGAGGVFGKVLVESGVGKALANMLQMIDLPLLPAAFIISLALRASQGS— AT 370 

Query: 37 8 XXXXXXXXXXXXXXXGFTDWQLACIVLATAAGSVGCSHFNDSGFWLVGRLLDMDVPTTLK 437 

G Q + LA G +G SH NDSGFW+V + L + V LK 
Sbjct: 371 VAILTTGGLLSEAVMGLNPIQCVLVTLAACFGGLGASHINDSGFWIVTKYLGLSVADGLK 430 



Sb j ct : 

Based on this analysis, including the identification of the presence of a putative leader sequence 



I TWTVNQTL1AFIGFALSALLFAIV 4 61 
TWTV T++ F GF ++ ++A++ 
431 TWTVLTTILGFTGFLITWCVWAVI 454 



(double-underlined) and several putative transmembrane domains (single-underlined) in the 
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gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 71 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 593>: 

1 . . GATTTCGGCA TATCGCCCGT GTATCTTTGG GTTGCCGCCG CGTTCAAACA 
51 TTTGCTGTCG CCGTGGGCTG CCGACT CAT A CGATGTCGCA CGCTTTGCAG 

101 GCGTATTTTT TGCCGTTATC GGACTGACTT CCTGCGGCTT TGCCGGTTTC 
151 AACTTTTTGG GCAGACACCA CGGGCGCAC . GTCGTCCTGA TTCTCATCGG 
201 CTGTATCGGG CTGATTCCAG TTGCCCATTT CCTCAACCCC GCTGCCGCCG 
251 CCTTTGCCGC CGCCGGACTG GTGCTGCACG GTTATTCTTT GGCTCGCCGG 
301 CGCGTGATTG CCGCCTCTTT TCTGCTCG3T ACGGGCTGGA CGCTGATGTC 
351 GTTGGCAGCA GCTTATCCGG CAGCATTTGC CCTGATGCTG CCCTTGCCCG 

401 TACTGATGTT TTTCCGTCCG .. 

This corresponds to the amino acid sequence <SEQ ID 594; ORF141>: 

1 . . DFGISPVYLW VAAAFKHLLS PWAADSYDVA RFAGVFFAVI GLTSCGFAGF 
51 NFLGRHHGRX WLILIGCIG LIPVAHFLNP AAAAFAAAGL VLHGYSLARR 
101 RVIAASFLLG TGWTLMSLAA AYPAAFALML PLPVLMFFRP . . 

Further work revealed the complete nucleotide sequence <SEQ ID 595>: 

1 ATGCTGACCT ATACCCCGCC CGATGCCCGC CCGCCCGCCA AAACCCACGA 

51 AAAGCCGTGG CTGCTGCTGT TGATGGCGTT TGCCTGGTTG TGGCCCGGCG 

101 TGTTTTCCCA CGATTTGTGG AATCCTGACG AACCTGCCGT CTATACCGCC 

151 GTCGAAGCAC TGGCAGGCAG CCCCACCCCC TTGGTTGCCC ATCTGTTCGG 

201 TCAAACCGAT TTCGGCATAC CGCCCGTGTA TCTTTGGGTT GCCGCCGCGT 

251 TCAAACATTT GCTGTCGCCG TGGGCTGCCG ACTCATACGA TGCCGCACGC 

301 TTTGCAGGCG TATTTTTTGC CGTTATCGGA CTGACTTCCT GCGGCTTTGC 

351 CGGTTTCAAC TTTTTGGGCA GACACCACGG GCGCAgCGTC GTCCTGATTC 

401 TCATCGGCTG TATCGGGCTG ATTCCAGTTG CCCATTTCCT CAACCCCGCT 

4 51 GCCGCCGCCT TTGCCGCCGC CGGACTGGTG CTGCACGGTT ATTCTTTGGC 

501 TCGCCGGCGC GTGATTGCCG CCTCTTTTCT GCTCGGTACG GGCTGGACGC 

551 TGATGTCGTT GGCAGCAGCT TATCCGGCAG CATTTGCCCT GATGCTGCCC 

601 TTGCCCGTAC TGATGTTTTT CCGTCCGTGG CAAAGCAGGC GTTTGATGTT 

651 GACGGCAGTC GCCTCACTTG CCTTTGCCCT GCCGCTTATG ACCGTTTACC 

7 01 CGCTGCTCTT GGCAAAAACG CAGCCCGCGC TGTTCGCGCA ATGGCTCGAC 

7 51 TATCACGTTT TCGGTACGTT CGGCGGCGTG CGGCACGTTC AGACGGCATT 

8 01 CAGTTTGTTT TACTATCTGA AAAACCTGCT TTGGTTTGCA TTGCCCGCGC 
851 TGCCGCTGGC GGTTTGGACG GTTTGCCGCA CGCGCCTGTT TTCGACCGAC 
901 TGGGGGATTT TGGGCGTCGT CTGGATGCTT GCCGTTTTGG TGCTGCTTGC 
951 CGTCAATCCG CAGCGTTTTC AGGATAACCT CGTCTGGCTG CTTCCGCCGC 

1001 TTGCCCTGTT CGGCGCGGCG CAACTGGACA GCCTGAGGCG CGGCGCGGCG 

1051 GCGTTTGTCA ACTGGTTCGG CATTATGGCG TTCGGACTGT TTGCCGTGTT 

1101 CCTGTGGACG GGCTTTTTCG CCATGAATTA CGGCTGGCCC GCCAAGCTTG 

1151 CCGAACGCGC CGCCTATTTC AGCCCGTATT ATGTTCCTGA TATCGATCCC 

1201 ATTCCGATGG CGGTTGCCGT ACT3TTCACA CCCTTGTGGC TGTGGGCGAT 

1251 TACCCGGAAA AACATACGCG GCAGGCAGGC GGTTACCAAC TGGGCGGCAG 

1301 GCGTTACCCT GACCTGGGCT TTGCTGATGA CGCTGTTCCT GCCGTGGCTG 

1351 GACGCGGCGA AAAGCCACGC GCCGGTCGTC CGGAGTATGG AGGCATCGCT 

1401 TTCCCCGGAA TTGAAACGGG AGCTTTCAGA CGGCATCGAG TGTATCGGCA 

14 51 TAGGCGGCGG CGACCTGCAC ACGCGGATTG TTTGGACGCA GTACGGCACA 

1501 TTGCCGCACC GCGTCGGCGA TGTACAATGC CGCTACCGCA TCGTCCTCCT 

1551 GCCCCAAAAT GCGGATGCGC CGCAAGGCTG GCAGACGGTT TGGCAGGGTG 

1601 CGCGTCCGCG CAACAAAGAC AGTAAGTTCG CACTGATACG GAAAATCGGG 

1651 GAAAATATAT AA 

This corresponds to the amino acid sequence <SEQ ID 596; ORF141-l>: 

1 MLTYTPPDAR PPAKTHEKPW LLLLMAFAWL WPGVFS HDLW NPDEPAVYTA 

51 VEALAGSPTP LVAHLFGQTD FGTPPVYLWV AAAFKHLLSP WAADSYDAAR 

101 FAGVFFAVIG LTSCGFAGFN FLGRHHGRS V VLILIGCIGL IPVAHF LNPA 

151 AAAFAAAGLV LHGYSLARRR VIAASFLLGT GWTLMSL AAA YPAAFALMLP 

201 LPVLMFF RPW QSRRL MLTAV ASLAFALPLM TV YPLLLAKT QPALFAQWLD 
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251 YHVFGTFGGV RHVQTAFSLF YYLKNLLWFA LPALPLAVWT VCRTRLFSTD 
301 W GILGWWML AVLVLLAW P QRFQDNLVWL LPPLALFGAA QLDSLRRGAA 
351 AFVNWFGIMA FGLFAVFLWT GFFAMNYGWP AKLAERAAYF SPYYVPDIDP 
401 IPMAVAVLFT PLWLWAI TRK NIRGRQAVTN WAAGVTLTWA LLMTLFL PWL 
4 51 DAAKSHAPVV RSMEASLSPE LKRELSDGIE CIGIGGGDLH TRIVWTQYGT 
501 LPHRVGDVQC RYRIVLLPQN ADAPQGWQTV WQGARPRNKD SKFALIRKIG 
551 EN I* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted QRF from N. meningitidis (strain A) 

ORF141 shows 95.0% identity over a 140aa overlap with an ORF (ORF141a) from strain A of AT. 
meningitidis: 

10 20 30 

or f 141. pep DFG I S P V YLWVAAA FKHL L S PWAAD S YDVA 

orfl41a WNPDEPAVYTAVEALAGSPTPLVAHLFGQIDFGIPPVYLWVAAAFKHLLSPWAADPYDAA 
40 50 60 70 B0 90 

40 50 60 70 80 90 

R FAGVFFAVIGLTSCGFA GFNFLGRHHGRX VVLILIGCIGLIPVAHF LNPAAAAFAAAGL 
I I I II I M : I M i I I I I I I M I I I I I ' I I I I I I I II I I I . I : : I M I I I I I I I I I I I I 
R FAGVFFAWGLTSCGFA GFNFLGRRHGRS WLILIGCIGLIPTVHF LNPAAAAFAAAGL 
100 110 120 130 140 150 

100 110 120 130 140 

Orfl41.pep VLHGYSLARRR VIAASFLLGTGWTLMSL AAA YPAAFALMLPLPVLMFF RP 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orfl41a VLHGYSLARRR VIAASFLLGTGWTLMSL AAA Y PAAFALMLPLPVLMFF RPWQSRRL MLTA 

160 170 180 190 200 210 

orf 14 la VASLAFALPLMTV YPLLLAKTQPALFAQWLDDHVFGTFGGVRHIQTAFSLFYYLKMLLWF 
220 230 240 250 260 270 

The complete length ORF141a nucleotide sequence <SEQ ID 597> is: 

1 ATGCTGACCT ATACCCCGCC CGATGCCCGC CCGCCCGCCA AAACCCACGA 

51 AAAGCCGTGG CTGTTGCTGT TGATGGCGTT TGCCTGGTTG TGGCCCGGCG 

101 TGTTTTCCCA CGATTTGTGG AATCCTGACG AACCTGCCGT CTATACCGCC 

151 GTCGAAGCAC TGGCAGGCAG CCCCACCCCT TTGGTTGCCC ATCTGTTCGG 

201 TCAAATCGAT TTCGGCATAC CGCCCGTGTA TCTTTGGGTT GCCGCCGCGT 

251 TCAAACATTT GCTGTCGCCG TGGGCTGCCG ACCCGTATGA TGCCGCACGC 

301 TTTGCCGGCG TGTTTTTCGC CGTTGTCGGA CTGACTTCCT GCGGCTTTGC 

351 CGGTTTCAAC TTTTTGGGCA GACACCACGG GCGCAGCGTC GTCCTGATTC 

401 TCATCGGCTG TATCGGGCTG ATTCCGACCG TACACTTTCT CAACCCCGCT 

451 GCCGCCGCCT TTGCCGCCGC CGGACTGGTG CTGCACGGTT ATTCTTTGGC 

501 TCGCCGGCGC GTGATTGCCG CCTCTTTTCT GCTCGGTACG GGTTGGACGC 

551 TGATGTCGTT GGCAGCAGCT TATCCGGCGG CATTTGCCCT GATGCTGCCC 

601 CTGCCCGTGC TGATGTTTTT CCGTCCGTGG CAAAGCAGGC GTTTGATGTT 

651 GACGGCAGTC GCCTCGCTTG CCTTTGCCCT GCCGCTTATG ACCGTTTACC 

701 CGCTGCTCTT GGCAAAAACG CAGCCCGCGC TGTTCGCGCA ATGGCTCGAC 

751 GATCACGTTT TCGGTACGTT CGGCGGCGTG CGGCACATTC AGACGGCATT 

801 CAGTTTGTTT TACTATCTGA AAAACCTGCT TTGGTTTGCA TTGCCTGCGC 

851 TGCCGCTGGC GGTTTGGACG GTTTGCCGCA CGCGCCTGTT TTCGACCGAC 

901 TGGGGGATTT TGGGCGTCGT CTGC-ATGCTT GCCGTTTTGG TGCTGCTTGC 

951 CGTCAATCCG CAGCGTTTTC AGGATAACCT CGTCTGGCTG CTTCCGCCGC 

1001 TTGCCCTGTT CGGCGCGGCG CAACTGGACA GCCTGAGACG CGGCGCGGCG 

1051 GCGTTTGTCA ACTGGTTCGG CATTATGGCG TTCGGACTGT TTGCCGTGTT 

1101 CCTGTGGACG GGCTTTTTCG CCATGAATTA CGGCTGGCCC GCCAAGCTTG 

1151 CCGAACGCGC CGCCTATTTC AGCCCGTATT ATGTTCCTGA TATCGATCCC 

1201 ATTCCGATGG CGGTTGCCGT ACTGTTCACA CCCTTGTGGC TGTGGGCGAT 

1251 TACCCGCAAA AACATACGCG GCAGGCAGGC GGTTACCAAC TGGGCGGCAG 

1301 GCGTTACCCT GACCTGGGCT TTGCTGATGA CGCTGTTCCT GCCGTGGCTG 

1351 GACGCGGCGA AAAGCCACGC GCCCGTCGTC CGGAGTATGG AGGCATCGCT 

1401 TTCCCCGGAA TTAAAACGGG AGCTTTCAGA CGGCATCGAG TGTATCGACA 

1451 TAGGCGGCGG CGACCTACAC ACGCGGATTG TTTGGACGCA GTACGGCACA 

1501 TTGCCGCACC GCGTCGGCGA TGTACAATGC CGCTACCGCA TCGTCCGCTT 

1551 GCCCCAAAAC GCGGATGCGC CGCAAGGCTG GCAGACGGTC TGGCAGGGTG 



orfl41a 
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This encodes a protein having amino acid sequence <SEQ ID 598>: 



MLTYTPPDAR PPAKTHEKPW LLLLMAFAWL WPGVFS HDLW NPDEPAVYTA 
~"~ - - - ) WAADpYDAAR 

j IPTVHFLNPA 



VEALAGSPTP LVAHLFGQID 

FAGVFFAWG LTSCGFA GFN 

AAAFAAAGLV LHGYSLARRR 

LPVLMFF RPW QSRRL MLTAV 

DHVFGT FGGV RHIQTAFSLF 

W GILGWWML AVLVLLAVN P 

AFVNWFGIMA FGLFAVFLWT 

IPMAVAVLFT PLWLWAI TRK 

DAAKSHAPVV RSMEASLSPE 

LPHRVGDVQC RYRIVRLPQN 
ENILKTTD* 



YYLKNLLWFA LPALPLAVWT 
QRFQDNLVWL LPPLALFGAA 
GFFAMNYGWP AKLAERAAYF 
NIRGRQAVTN WAAGVTLTWA 
LKRELSDGIE CIDIGGGDLH 
ADAPQGWQTV WQGARPRNKD 



YPAAFALMLP 
QPALFAQWLD 
VCRTRLFSTD 
QLDSLRRGAA 
SPYYVPDIDP 
LLMTLFLPWL 



ORF141a and ORF141-1 show 98.2% identity in 553 aa overlap: 



oif 141a. pep 

orfl41-l 

orfl41a.pep 

orfl41-l 

orf 141a. pep 

orfl41-l 

orfl41a.pep 

orfl41-l 

orfl41a.pep 

orfl41-l 

orfl41a.pep 

orfl41-l 

orfl41a.pep 

orfl41-l 

orfl41a.pep 



MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPDEPAVYTAVEALAGSPTP 

I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I 1 I I I I I I I I I I I I I I I I I I I 

MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPDEPAVYTAVEALAGSPTP 

LVAHLFGQIDFGIPPVYLWVAAAFKHLLSPWAADPYDAARFAGVFFAWGLTSCGFAGFN 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I : I I I I I I I I I I I 
LVAHLFGQTDFGIPPVYLWVAAAFKHLLSPWAADSYDAARFAGVFFAVIGLTSCGFAGFN 

FLGRHHGRSWLILIGCIGLIPTVHFLNPAAAAFAAAGLVLHGYSLARRRVIAASFLLGT 
I I I I I I I I I I I I I I I I I I I I I I : : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
FLGRHHGRSWLILIGCIGLIPVAHFLNPAAAAFAAAGLVLHGYSLARRRVIAASFLLGT 

GWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTAVASLAFALPLMTVYPLLLAKT 
I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I 
GWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTAVASLAFALPLMTVYPLLLAKT 

QPALFAQWLDDHVFGTFGGVRHIQTAFSLFYYLKNLLWFALPALPLAVWTVCRTRLFSTD 
I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 
QPALFAQWLDYHVFGTFGGVRHVQTAFSLFYYLKNLLWFALPALPLAVWTVCRTRLFSTD 

WGILGVWMLAVLVLLAVNPQRFQDNLVWLLPPLALFGAAQLDSLRRGAAAFVNWFGIMA 

I I II I I I I I I I I I I I I I I I I I I I I I II I I 

WGILGVVWMLAVLVLLAVNPQRFQDNLVWLLPPLALFGAAQLDSLRRGAAAFVNWFGIMA 

FGLFAVFLWTGFFAMNYGWPAKLAERAAYFSPYYVPDI DPI PMAVAVLFT PLWLWAITRK 

I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I 

FGLFAVFLWTGFFAMNYGWPAKLAERAAYFSPYYVPDI DPI PMAVAVLFT PLWLWAITRK 

NIRGRQAVTNWAAGVTLTWALLMTLFLPWLDAAKSHAPWRSMEASLSPELKRELSDGIE 

III Ill I I I II I I I I I I I I I I 

orf 14 1-1 NIRGRQAVTNWAAGVTLTWALLMTLFLPWLDAAKSHAPWRSMEASLSPELKRELSDGIE 

orf 14 la. pep CIDIGGGDLHTRIVWTQYGTLPHRVGDVQCRYRIVRLPQNADAPQGWQTVWQGARPRNKD 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 14 1-1 CIGIGGGDLHTRIVWTQYGTLPHRVGDVQCRYRIVLLPQNADAPQGWQTVWQGARPRNKD 

orfl41a.pep SKFALIRKTGENI 
I I I I I I I I Mil 
orfl41-l SKFALIRKIGENI 

Homology with a predicted ORF from N. gonorrhoeae 

ORF141 shows 95% identity over a 140aa overlap with a predicted ORF (ORF141ng) from 
N. gonorrhoeae: 



orfl41.pep 



DFGISPVYLWVAAAFKHLLSPWAADSYDVA 



I I I I 



I I I 



II II: 



WNPAEPAVYTAVEALAGSPTPLVAHLFGQTDFGIPPVYLWVAAAFKHLLSPWAAHPYDAA 126 
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orfl41.pep RFAGVFFAVIGLTSCGFAGFNFLGRHHGRXVVLILIGCIGLIPVAHFLNPAAAAFAAAGL 90 

orf 141ng RFAGVFFAVIGLTSCGFAGFNFLGRHHGRSVVLIHIGCIGLIPVAHFFNPAAAAFAAAGL 18 6 

orfl41.pep VLHGYSLARRRVIAASFLLGTGWTLMSLAAAYPAAFALMLPLPVLMFFRP 140 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orfl41ng VLHGYSLARRRVIAASFLLGTGWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTA 24 6 

An ORF141ng nucleotide sequence <SEQ ID 599> was predicted to encode a protein having amino 
acid sequence <SEQ ID 600>: 



1 MPSEAVSARP LCEYLLHLAI RPFLLTLMIT YTPPDARPPA KTHEKPWLLL 

51 LMAFAWLWPG VFS HDLWNPA EPAVYTAVEA LAGSPTPLVA HLFGQTDFGI 

101 PPVYLWVAAA FKHLLSPWAA HPYDAAR FAG VFFAVIGLTS CGFA GFNFLG 

151 RHHGRS WLI HIGCIGLIPV AHF FNPAAAA FAAAGLVLHG YSLARRRVIA 

201 ASFLLGTGWT LMSL AAA YPA AFALMLPLPV LMFF RPWQSR RL MLTAVASL 

251 AFALPLMTV Y PLLLAKTQPA LFAQWLNYHV FGTFGGVRHI QRAFSLFHYL 

301 KNLLWFAPPG LPLAVWTVCR TRLFSTDW GI LGIVWMLAVL VLLAF NPQRF 

351 QDNLVWLLPP LALFGAAQLD SLRRGAAAFV NWFG IMAFGL FAVFLWTGFF 

401 AMNYGWPAKL AERAAYFSPY YVPDIDP IPM AVAVLFTPLW LWAI TRKNIR 

451 GRQAVTN WAA GVTLTWALLM TLFL PWLDAA KSHAPWRSM EASFSPELKR 

501 ELSDGIECIG IGGGDLHTRI VWTQYGTLPH RVGDVRCRYR IVRLPQNADA 

551 PQGWQTVWQG ARPRNKDSKF ALIRKIGENI LKTTD* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 601>: 



1 ATGCTGACCT ATACCCCGCC CGATGCCCGC CCGCCCGCCA AAACCCACGA 

51 AAAACCGTGG CTGCTGCTGT TGATGGCGTT TGCCTGGCTG TGGCCCGGCG 

101 TGTTTTCCCA CGATTTGTGG AATCCTGCCG AACCTGCCGT CTATACCGCC 

151 GTCGAAGCAC TGGCAGGCAG CCCCACCCCC TTGGTTGCCC ATCTGTTCGG 

201 TCAAACCGAT TTCGGCATAC CGCCCGTGTA TCTTTGGGTT GCCGCCGCAT 

251 TCAAACATTT GCTGTCGCCG TGGGCAGCCG ACCCGTATGA TGCCGCACGC 

301 TTTGCAGGCG TATTTTTTGC CGTTATCGGA CTGACTTCTT GCGGCTTTGC 

351 CGGTTTCAAC TTTTTGGGCA GACACCACGG GCGCAGCGTT GTTTTAATCC 

401 ATATCGGCTG TATCGGGCTG ATTCCGGTTG CCCATTTCCT CAATCCcgcc 

4 51 gccgccgcct tTGCCGCCGC CGGACTGGTG CTGCacggct actcgctgGC 

501 ACGCCGGCGC GTGATtgccg cctctTtccT GCTCGGTACG GGTTGGACGT 

551 TGATGTCGCT GGCGGCAGCT TATCCGGCGG CGTTTGCGCT GATGCTGCCC 

601 CTGCCCGTGC TGATGTTTTT CCGTCCGTGG CAAAGCAGGC GTTTGATGTT 

651 GACGGCAGTC GCCTCGCTTG CCTTTGCCCT GCCGCTTATG ACCGTTTACC 

701 CGCTGCTCtt gGCAAAAACG CAGCCCGCGC TGTTTGCGCA ATGGCTCAAC 

751 TATCACGTTT TCGGTACGTt cggcgGCGTG CGGCAcaTTC AGAggGCatT 

801 Cagtttgttt cactatctgA AAaatctgct ttggttcgca ccgcccgggC 

851 TGCCGCTGGC GGTTTGGACG GTTTGCCGCA CACGCCTGTT TTCGACCGAC 

901 TGGGGGATTT TGGGCATTGT CTGGATGCTT GCCGTTTTGG TGCTGCTCGC 

951 CTTTAATCCG CAGCGTTTTC AAGACAACCT CGTCTGGCTG CTGCCGCCGC 

1001 TTGCCCTGTT CGGCGCGGCG CAACTGGACA GCCTGAGGCG CGGCGCGGCG 

1051 GCTTTTGTCA ACTGGTTCGG CATTATGGCG TTCGGGCTGT TTGCCGTGTT 

1101 CCTGTGGACG GGCTTTTTCG CCATGAATTA CGGCTGGCCC GCCAAGCTTG 

1151 CCGAACGCGC CGCCTACTTC AGCCCGTATT ACGTTCCCGA CATCGATCCC 

1201 ATTCCGATGG CGGTTGCCGT ACTGTTCACA CCCTTGTGGC TGTGGGCGAT 

1251 TACCCGGAAA AACATACGCG GCAGGCAGGC GGTTACCAAC TGGGCGGCAG 

1301 GCGTTACCCT GACCTGGGCT TTGCTGATGA CGCTGTTCCT GCCGTGGCTG 

1351 GACGCGGCGA AAAGCCACGC GCCCGTCGTC CGGAGTATGG AGGCATCGTT 

14 01 TTCCCCGGAA TTAAAACGGG AGCTTTCAGA CGGCATCGAG TGTATCGGCA 

1451 TAGGCGGCGG CGACCTGCAC ACGCGGATTG TTTGGACGCA GTACGGCACA 

1501 TTGCCGCACC GCGTCGGCGA TGTCCGTTGC CGCTACCGTA TCGTCCGCCT 

1551 GCCCCAAAAC GCGGATGCGC CGCAAGGCTG GCAGACGGTC TGGCAGGGTG 

1601 CGCGCCCGCG CAACAAAGAC AGTAAGTTTG CACTGATACG GAAAATCGGG 

1651 GAAAATATAT TAAAAACAAC AGATTGA 

This corresponds to the amino acid sequence <SEQ ID 602; ORF141ng-l>: 



1 MLTYTPPDAR PPAKTHEKPW LLLLMAFAWL WPGVFS HDLW NPAEPAVYTA 

51 VEALAGSPTP LVAHLFGQTD FGIPPVYLWV AAAFKHLLSP WAADPYDAAR 

101 FAGVFFAVIG LTSCGFA GFN FLGRHKGRS V VLIHIGCIGL IPVAHF LNPA 

151 AAAFAAAGLV LHGYSLARRR VIAASFLLGT GWT1MSL AAA YPAAFALMLP 

201 LPVLMFFRPW QSRRL MLTAV ASLAFALPLM TV YPLLLAKT QPALFAQWLN 

251 YHVFGT FGGV RHIQRAFSLF HYLKNLLWFA PPGLPLAVWT VCRTRLFSTD 

301 W GILGIVWML AVLVLLAF NP QRFQDNLVWL LPPLALFGAA QLDSLRRGAA 



WO 99/24578 



-342- 



PCT/IB98/01665 



351 AFVNWFG IMA FGLFAVFLWT GFFA MNYGKP AKLAERAAYF SPYYVPDIDP 

401 IPMAVAVLFT PLWLWAI TRK NIRGRQAVTN WAAGVTLTWA LLMTLFL PWL 

451 DAAKSHAPW RSMEASFSPE LKRELSDGIE CIGIGGGDLH TRIVWTQYGT 

501 LPHRVGDVRC RYRIVRLPQN ADAPQGWQTV WQGARPRNKD SKFALIRKIG 

551 ENILKTTD* 



ORF141ng-l and ORF: 



orf 141ng-l . pep MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPAEPAVYTAVEALAGSPTP 

orf 141-1 MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPDEPAVYTAVEALAGSPTP 

orf 141ng-l . pep LVAHLFGQTDFGI PPVYLWVAAAFKHLLSPWAADPYDAARFAGVFFAVIGLTSCGFAGFN 

orfl41-l LVAHLFGQTDFGIPPVYLWVAAAFKHLLSPWAADSYDAARFAGVFFAVIGLTSCGFAGFN 

orf 141ng-l .pep FLGRHHGRSVVLIHIGCIGLIPVAHFLNPAAAAFAAAGLVLHGYSLARRRVIAASFLLGT 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 14 1-1 FLGRHHGRSVVLILIGCIGLIPVAHFLNPAAAAFAAAGLVLHGYSLARRRVIAASFLLGT 

orf 141ng-l . pep GWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTAVASLAFALPLMTVYPLLLAKT 

orf 141-1 GWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTAVASLAFALPL^ 

orfl41ng-l .pep QPALFAQWLNYHVFGT FGGVRHIQRAFSLFHYLKNLLWFAPPGLPLAVWTVCRTRLFSTD 



41-1 show 97.5% identity in 553 aa overlap: 



I : 



I I I I : I I I ! I I I 



orf 14 1-1 QPALFAQWLDYHVFGTFGGVRHVQTAFSLFYYLKNLLWFALPALPLAVWTVCRTRLFSTD 

orf 141ng-l . pep WGILGIVWMLAVLVLLAFNPQRFQDNLVWLLPPLALFGAAQLDSLRRGAAAFVNWFGIMA 

I I I I I : I I I I I I II I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 141-1 WGILGWWMLAVLVLLAVNPQRFQDNLVWLLPPLALFGAAQLDSLRRGAAAFVNWFGIMA 

orf 141ng-l . pep FGLFAVFLWTGFFAMNYGWPAKLAERAAYFSPYYVPDIDPIPMAVAVLFTPLWLWAITRK 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 

orf 141-1 FGLFAVFLKTGFFAMNYGWPAKLAERAAYFSPYYVPDIDPIPMAVAVLFTPLWLWAITRK 

orfl41ng-l.pep NIRGRQAVTNWAAGVTLTWALLMTLFLPWLDAAKSHAPWRSMEASFSPELKRELSDGIE 

orf 141-1 NIRGRQAVTNWAAGVTLTWALLMTLFLPWLDAAK^ 

orfl41ng-l.pep CIGIGGGDLHTRIVWTQYGTLPHRVGDVRCRYRIVRLPQNADAPQGWQTVWQGARPRNKD 

"Mill I I I I : I I I I I I I I I I I I I I I I I I I I 

CIGIGGGDLHTRIVWTQYGTLPHRVGDVQCRYRIVLLPQNADAPQGWQTVWQGARPRNKD 

orfl41ng-l.pep SKFALIRKIGENILKTTDX 



orfl41-l SKFALIRKIGENIX 

Based on the presence of several putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 72 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 603>: 

1 . . CAATCCGCCA AATGGTTATC GGGCCAAACT CTAGTCGGCA CAGCAATTGG 

51 GATACGCGGG CAGATAAAGC TTGGCGGCAA CCTGCATTAC GATATATTTA 

101 CCGGCCGCGC AT T G AAAAAG CCCGAATTTT TCCAATCAAG GAAATGGGCA 

151 AGCGGTTTTC AGGTAGGCTA TACGTTTTAA 

This corresponds to the amino acid sequence <SEQ ID 604; ORF142>: 



Further work revealed the complete nucleotide sequence <SEQ ID 605> 
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1 ATGGATAATT CGGGTAGTGA GGCGACAGGA AAATACCAAG GAAATATCAC 

51 TTTCTCTGCC GACAATCCTT TGGGACTGAG TGATATGTTC TATGTAAATT 

101 ATGGACGTTC GATTGGCGGT ACGCCCGATG AGGAAAGTTT TGACGGCCAT 

151 CGCAAAGAAG GCGGATCAAA CAATTACGCC GTACATTATT CAGCCCCTTT 

201 CGGTAAATGG ACATGGGCAT TCAATCACAA TGGCTACCGT TACCATCAGG 

251 CAGTTTCCGG AT TAT CGGAA GTCTATGACT ATAATGGAAA AAGTTACAAT 

301 ACTGATTTCG GCTTCAACCG CCTGTTGTAT CGTGATGCCA AACGCAAAAC 

351 CTATCTCGGT GTAAAACTGT GGATGAGGGA AACAAAAAGT TACATTGATG 

4 01 ATGCCGAACT GACTGTACAA CGGCGTAAAA CTGCGGGTTG GTTGGCAGAA 

4 51 CTTTCCCACA AAGAATATAT CGGTCGCAGT ACGGCAGATT TTAAGTTGAA 

501 ATATAAACGC GGCACCGGCA TGAAAGATGC TCTGCGCGCG CCTGAAGAAG 

551 CCTTTGGCGA AGGCACGTCA CGTATGAAAA TTTGGACGGC ATCGGCTGAT 

601 GTAAATACTC CTTTTCAAAT CGGTAAACAG CTATTTGCCT ATGACACATC 

651 CGTTCATGCA CAATGGAACA AAACCCCGCT AACATCGCAA GACAAACTGG 

7 01 CTATCGGCGG ACACCACACC GTACGTGGCT TCGACGGTGA AATGAGTTTG 

7 51 TCTGCCGAGC GGGGATGGTA TTGGCGCAAC GATTTGAGCT GGCAATTTAA 

801 ACCAGGCCAT CAGCTTTATC TTGGGGCTGA TGTAGGACAT GTTTCAGGAC 

851 AATCCGCCAA ATGGTTATCG GGCCAAACTC TAGTCGGCAC AGCAATTGGG 

901 ATACGCGGGC AGATAAAGCT TGGCGGCAAC CTGCATTACG ATATATTTAC 

951 CGGCCGCGCA TTGAAAAAGC CCGAATTTTT CCAATCAAGG AAATGGGCAA 

1001 GCGGTTTTCA GGTAGGCTAT ACGTTTTAA 

This corresponds to the amino acid sequence <SEQ ID 606; ORF142-l>: 



1 MDNSGSEATG KYQGNITFSA DNPLGLSDMF YVNYGRSIGG TPDEESFDGH 

51 RKEGGSNNYA VHYSAPFGKW TWAFNHNGYR YHQAVSGLSE VYDYNGKSYN 

101 TDFGFNRLLY RDAKRKTYLG VKLWMRETKS YIDDAELTVQ RRKTAGWLAE 

151 LSHKEYIGRS TADFKLKYKR GTGMKDALRA PEEAFGEGTS RMKIWTASAD 

201 VNTPFQIGKQ LFAYDTSVHA QWNKTPLTSQ DKLAIGGHHT VRGFDGEMSL 

251 SAERGWYWRN DLSWQFKPGH QLYLGADVGH VSGCSAKWLS GQTLVGTAIG 

301 IRGQIKLGGN LHYDIFTGRA LKKPEFFQSR KWASGFQVG Y TF * 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N.eonorrhoeae 

ORF142 shows 88.1% identity over a 59aa overlap with a predicted ORF (ORF142ng) from 
N. gonorrhoeae: 



orfl42.pep QSAKWLSGQTLVGTAIGIRGQIKLGGNLHY 30 

I I I I I ! I I I I I : I I I I I I I I I I I I I 

orfl42ng RGWYWRNDLSWQFKPGHQLYLGADVGHVSGQSAKWLSGQTLAGTAIGIRGQIKLGGNLHY 313 

orfl42.pep DIFTGRALKKPEFFQSRKWASGFQVGYTF 59 

orfl42ng DIFTGRALKKPEYFQTKKWVTGFQVGYSF 342 

The complete length ORF142ng nucleotide sequence <SEQ ID 607> is: 

1 ATGGATAATT CGGGTAGTGA GGCGACAGGA AAATACCAAG GAAATATCAC 

51 TTTCTCTGCC GACAATCCTT TTGGACTGAG TGATATGTTC TATGTAAATT 

101 ATGGACGTTC AATTGGCGGT ACGCCCGATG AGGAAAATTT TGACGGCCAT 

151 CGCAAAGAAG GCGGATCAAA CAATTACGCC GTACATTATT CAGCCCCTTT 

201 CGGTAAATGG ACATGGGCAT TCAATCACAA TGGCTACCGT TACCATCAGG 

251 CGGTTTCCGG ATTATCGGAA GTCTATGACT ATAATGGAAA AAGTTACAAC 

301 ACTGATTTCG GCTTCAACCG CCTGTTGTAT CGTGATGCCA AACGCAAAAC 

351 CTATCTCAGT GTAAAACTGT GGACGAGGGA AACAAAAAGT TACATTGATG 

401 ATGCCGAACT GACTGTACAA CGGCGTAAAA CCACAGGTTG GTTGGCAGAA 

451 CTTTCCCACA AAGGATATAT CGGTCGCAGT ACGGCAGATT TTAAGTTGAA 

501 ATATAAACAC GGCACCGGCA TGAAAGATGC TCTGCGCGCG CCTGAAGAAG 

551 CCTTTGGCGA AGGCACGTCA CGTATGAAAA TTTGGACGGC ATCGGCTGAT 

601 GTAAATACTC CTTTTCAAAT CGGTAAACAG CTATTTGCCT ATGACACATC 

651 CGTTCATGCA CAATGGAACA AAACCCCGCT AACATCGCAA GACAAACTGG 

701 CTATCGGCGG ACACCACACC GTACGTGGCT TCGACGGTGA AATGAGTTTG 

751 CCTGCCGAGC GGGGATGGTA TTGGCGCAAC GATTTGAGCT GGCAATTTAA 

801 ACCAGGCCAT CAGCTTTATC TTGGGGCTGA TGTAGGACAT GTTTCAGGAC 

851 AATCCGCCAA ATGGTTATCG GGCCAAACTC TAGCCGGCAC AGCAATTGGG 

901 ATACGCGGGC AGATAAAGCT TGGCGGCAAC CTGCATTACG ATATATTTAC 

951 CGGCCGTGCA TTGAAAAAGC CCGAATATTT TCAGACGAAG AAATGGGTAA 
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1001 CGGGGTTTCA GGTGGGTTAT TCGTTTTGA 

This encodes a protein having amino acid sequence <SEQ ID 608>: 

1 MDNSGSEATG KYQGNITFSA DNPFGLSDMF YVNYGRSIGG TPDEENFDGH 

51 RKEGGSNNYA VHYSAPFGKW TWAFNHNGYR YHQAVSGLSE VYDYNGKSYN 

101 TDFGFNRLLY RDAKRKTYLS VKLWTRETKS YIDDAELTVQ RRKTTGWLAE 

151 LSHKGYIGRS TADFKLKYKH GTGMKDALRA PEEAFGEGTS RMKIWTASAD 

201 VNTPFQIGKQ LFAYDTSVHA QWNKTPLTSQ DKLAIGGHHT VRGFDGEMSL 

251 PAERGWYWRN DLSWQFKPGH QLYLGADVGH VSGQSAKWLS GQTLAGTAIG 

301 IRGQIKLGGN LHYDIFTGRA LKKPEYFQTK KWVTGFQVG Y SF * 

The underlined sequence (aromatic-Xaa-aromatic amino acid motif) is usually found at the 
C-terminal end of outer membrane proteins. 



ORF142ng and ORF142-1 show 95.6% identity over 342aa overlap: 

orf 142-1. pep MDNSGSEATGKYQGNITFSADNPLGLSDMFYVNYGRSIGGTPDEESFDGHRKEGGSNNYA 
orfl42ng-l MDNSGSEATGKYQGNITFSADNPFGLSDMFYVNYGRSIGGTPDEENFDGHRKEGGSNNYA 
orf 142-1. pep VHYSAPFGKWTWAFNHNGYRYHQAVSGLSEVYDYNGKSYNTDFGFNRLLYRDAKRKTYLG 
orf 142ng-l VHYSAPFGKWTWAFNHNGYRYHQAVSGLSEVYDYNGKSYNTDFGFNRLLYRDAKRKTYLS 

orf 142-1. pep VKLWMRETKSYIDDAELTVQRRKTAGWLAELSHKEYIGRSTADFKLKYKRGTGMKDALRA 

I I I I I I I I I I I I I I I I I I I I I i I : I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I 
orfl42ng-l VKLWTRETKS YIDDAELTVQRRKTTGWLAELSHKGYIGRSTADFKLKYKHGTGMKDALRA 

orf 142-1 . pep PEEAFGEGTSRMKIWTASADVNTPFQIGKQLFAYDTSVHAQWNKTPLTSQDKLAIGGHHT 

I I n 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 ii 1 1 1 1 1 1 1 1 1 1 1 1 1 n 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orfl42nq-l PEEAFGEGTSRMKIWTASADVNTPFQIGKQLFAYDTSvHAQWNKTPLTSQDKLAIGGIIHT 

orf 142-1 . pep VRGFDGEMSLSAERGWYWRNDLSWQFKPGHQLYLGADVGHVSGQSAKWLSGQTLVGTAIG 
I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I 
orfl42ng-l VRGFDGEMSLPAERGWYWRNDLSWQFKPGHQLYLGADVGHVSGQSAKWLSGQTLAGTAIG 

orf 142-1. pep IRGQIKLGGNLHYDIFTGRALKKPEFFQSRKWASGFQVGYTF 
I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I : I 
orfl42ng-l IRGQ I KLGGNLHYD I FTGRALKKPE YFQTKKWVTG FQVG Y S F 

In addition, ORF142ng is homologous to the HecB protein of E.chrysanthemi: 



gi 1 1772622 (L39897) HecB [Erwinia chrysanthemi] Length = 558 
Score = 119 bits (295), Expect = 3e-26 

Identities - 88/346 (25%), Positives = 151/346 (43%), Gaps = 22/346 (6%) 

Query: 2 DNSGSEATGKYQGNITFSADNPFGLSDMFYVNYGRSIGGTPDEENFDGHRKEGGSNNYAV 61 

DNSG ++TG+ Q N + + DN FGL+D ++++ G S + + D + G 
Sbjct: 230 DNSGQKSTGEEQLNGSLALDNVFGLADQWFISAGHS— SRFATSHDAESLQAG 280 

Query: 62 HYSAPFGKWTWAFNHNGYRYHQAVSGLSEVYDYNGKSYNTDFGFNRLLYRDAKRKTYLSV 121 

+ S P+G W +N++ RY + G S F +R+++RD KT ++ 

Sbjct: 281 -FSMPYGYWNLGYNYSQSRYRNTFINRDFPWHSTGDSDTHRFSLSRWFRDGTMKTAIAG 339 

Query: 122 KLWTRETKSYIDDAELTVQRRKTTGWLAELSHKGYIGRSTADFKLKYKHGTGMKDALRAP 181 

R +Y++ + L RK + ++H + A F Y G + 

Sbjct: 340 TFSQRTGNNYLNGSLLPSSSRKLSSVSLGVNHSQKLWGGLATFNPTYNRGVRWLGSETDT 399 

Query: 182 EEAFGEGTSRMKIWTASADVNTPFQIGKQLFAYDTSVHAQWNKTPLTSQDKLAIGGHHTV 241 

+++ E + WT SA P Y S++ Q++ L ++L +GG ++ 

Sbjct: 400 DKSADEPRAEFNKWTLSASYYHPV TDSITYLGSLYGQYSARALYGSEQLTLGGESSI 456 

Query: 242 RGFDGEMSLPAERGWYWRNDLSWQFKP GHQLYLGA-DVGHVSGQSAKWLSGQTLAG 296 

RGF E RG YWRN+L+WQ G+ ++ A D GH+ + +L G 

Sbjct: 457 RGF-REQYTSGNRGAYWRNE LNWQAWQ1FVLGNVTFMAAVDGGHLYNHKQDNSTAASLWG 515 



Query: 



2 97 TAIGIRGQIKLGGNLHYDIFTGRALKKPEYFQTKKWVTGFQVGYSF 342 
A+G+ + L +G+P+Q V G++VG SF 
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Sbjct: 516 GAVGMTVASRW LSQQVTVGWPISYPAWLQPDTMWGYRVGLSF 558 

On the basis of this analysis, it is predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

5 Example 73 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 609>: 

1 ATGCGGACGA AATGGTCAGC AGTGAGAAGC TGC1JACTTG GgCGGACACC 

51 GCCGACATCG ATACCGCTTT GAACCTGTTG TACCGTTTGC AAAAACTCGA 

101 ATTCCTCTAT GGCGATGAAA ACGGTCATTC AGACGGCATC AATTTGwCGG 

10 151 ACGAGCAATT GCCGTTGCTG ATGGAACAAT TGTCCGGCAG CGGTAAGGCG 

201 TTATTGGTCG ATCGGAACGG TCTGTATCTT GCCAACGCCA ATTTCCATCA 

251 TGAGGCGGCG GAAGAGTTGG GGTTGTTGGC GGCAGAAGTC GCACAGATGG 

301 AAAAGAAATA CCGGCTGCTG ATTAAGAACA AC. 

This corresponds to the amino acid sequence <SEQ ID 610; ORF143>: 

15 1 MRTKWSAVRS C7WADTADID TALNLLYRLQ KLEFLYGDEN GHSDGINLXD 

51 EQLPLLMEQL SGSGKALLVD RKGLYLANAN FHHEAAEELG LLAAEVAQME 
101 KKYRLLIKNN .. 

Further work revealed the complete nucleotide sequence <SEQ ID 61 1>: 

1 ATGGAATCAA CACTTTCACT ACAAGCAAAT TTATATCCCC GCCTGACTCC 

20 51 TGCCGGTGCA TTTTATGCCG TATCCAGCGA TGCCCCCAGT GCCGGTAAAA 

101 CTTTGTTGCA CAGCCTGTTG AAAGCAGATG CGGACGAAAT GGTCAGCAGT 

151 GAGAAGCTGC TTACTTGGGC GGACACCGCC GACATCGATA CCGCTTTGAA 

201 CCTGTTGTAC CGTTTGCAAA AACTCGAATT CCTCTATGGC GATGAAAACG 

251 GTCATTCAGA CGGCATCAAT TTGTCGGACG AGCAATTGCC GTTGCTGATG 

25 301 GAACAATTGT CCGGCAGCGG TAAGGCGTTA TTGGTCGATC GGAACGGTCT 

351 GTATCTTGCC AACGCCAATT TCCATCATGA GGCGGCGGAA GAGTTGGGGT 

4 01 TGTTGGCGGC AGAAGTCGCA CAGATGGAAA AGAAATACCG GCTGCTGATT 

4 51 AAGAACAACC TGTATATCAA CAATAACGCT TGGGGCGTTT GCGATCCTTC 

501 CGGTCAGAGC GAATTGACAT TTTTCCCATT GTATATCGGT TCAACCAAAT 

30 551 TTATTTTGGT TATCGGCGGC ATTCCCGATT TGGGCAAAGA GGCATTTGTT 

601 ACTTTGGTAA GGATTTTATA CCGCCGTTAC AGCAACCGCG TGTAA 

This corresponds to the amino acid sequence <SEQ ID 612; ORF143-l>: 

1 MESTLSLQAN LYPRLTPAGA FYAVSSDAPS AGKTLLHSLL KADADEMVSS 

51 EKLL IWADTA DIDTALNLLY RLQKLEFLYG DENGHSDGIN LSDEQLPLLM 

35 101 EQLSGSGKAL LVDRNGLYLA NAN FHHEAAE ELGLLAAEVA QMEKKYRLLI 

151 KNNLYINNNA WGVCDPSGQS ELT FFPLYIG STKFI LVIGG IPDLGKEAFV 

201 TLVRILYRRY SNRV* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 
40 ORF143 shows 92.4% identity over a 105aa overlap with an ORF (ORF143a) from strain A of N. 
meningitidis: 

10 20 30 

or-;143 . pep MRTKWSAVRSCTWADTADIDTALNLLYRLQKLEFL 

45 orfl43a GAFYAVSSDXPSAGKTLLHSLLKADADEMVSSEKLLTWAXTADIDTALNLLYRLQKLEFL 



40 50 60 70 80 90 

^ or f 1 4 3 . pep YGDENGHSDGINLXDEQLPLLMEQLSGSGKALLVDRNGLYLANANFHHEAAEELGLLAAE 
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• rf 143 .pep 
>rfl43a 



VAQMEKKYRLLIKNN 



The complete length ORF143a nucleotide sequence <SEQ ID 613> is: 



ATGGAATCAA 
TGCCGGTGCA 
CTTTGTTGCA 
GAGAAGCTGC 
CCTGTTGTAC 
GTCATTCAGA 
GAACAATTGT 
GTATCTTGCC 
TGTTGGCGGC 
AAGAACAACC 
CGGTCAGAGC 
TTATTTTGGT 
ACTTTGGTAA 
TGGGAGAGAG 



CANTTTCACT 
TTTTATGCCG 
CAGCCTGTTG 
TTACCTGGGC 
CGTTTGCAAA 
CGGCATCAAT 
CCGGCAGCGG 
AACGCCAATT 
AGAAGTCGCA 
TGTATATCAA 
GAATTGACAT 
TATCGGCGGC 
GGATNTTATA 
GANGGGTTAT 



ACAAGCAAAT 
TATCCAGCGA 
AAAGCGGATG 
GGANACCGCC 
AACTCGAATT 
TTGTCGGACG 
TAAGGCGTTA 
TCCATCATGA 
CAGATGGAAA 
CAATAACGCT 
TTTTCCCATT 
ATTCCCGATT 
CCNCCNGTTA 
GCAGCAATTA 



TTATATCNCC 
TGNCCCCAGT 
CGGACGAAAT 
GACATCGATA 
CCTCTATGGC 
AGCAATTGCC 
TTGGTCGATC 
GGCGGCGGAA 
AGAAATACCG 
TGGGGCGTTT 
GTATATCGGT 
TGGGCAAAGA 
CAGCAACCGC 
TTGA 



GCCTGACTCC 
GCCGGTAAAA 
GGTNAGCAGT 
CCGCTTTGAA 
GATGAAAACG 
GTTGCTGATG 
GGAACGGTCT 
GAGTTGGGGT 
GCTGCNNATT 
GCGATCCTTC 
TCAACCAAAT 
GGCATTTGTT 
GTGTAAAACT 



This encodes a protein having amino acid sequence <SEQ ID 614>: 



1 MESTXSLQAN LYXRLTPAGA FYAVSSDXPS AGKTLLHSLL KADADEMVSS 

51 EKLLTWAXTA DIDTALNLLY RLQKLEFLYG DENGHSDGIN LSDEQLPLLM 

101 EQLSGSGKAL LVDRNGLYLA NANFHHEAAE ELGLLAAEVA QMEKKYRLXI 

151 KNNLYINNNA WGVCDPSGQS ELT FFPLYIG STKFTLVIGG IPDLGKEAFV 

201 TLVRXLYXXL QQPRVKLGRS XGLCSNY* 

ORF143a and ORF143-1 show 97.1% identity in 207 aa overlap: 



orf 14 3a . pep MESTXSLQANLYXRLTPAGAFYAVSSDXPSAGKTLLHSLLKADADEMVSSEKLLTWAXTA 

III! I I I I I II I I I I 1 1 I I I I I I II I I I I I I I II I I I I I I I I I I M I I I I I I I I II 

orf 143-1 MESTLSLQANLYPRLTPAGAFYAVSSDAPSAGKTLLHSLLKADADEMVSSEKLLTWADTA 

orf 14 3a. pep DIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 

orfl4 3-l DIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 

orf 14 3a. pep NANFHHEAAEELGLLAAEVAQMEKKYRLXIKNNLYINNNAWGVCDPSGQSELTFFPLYIG 
I I I I I I I I I I I I I I I I II I I I I I I! I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 14 3-1 NANFHHEAAEELGLLAAEVAQMEKKYRLLIKNNLYINNNAWGVCDPSGQSELTFFPLYIG 

orf 14 3a . pep STKFILVIGGIPDLGKEAFVTLVRXLY 

I I I I I I I I I I I I I I I I I I II 

orf 143-1 STKFILVIGGIPDLGKEAFVTLVRILY 



Homology with a predicted ORF from N. gonorrhoeae 

ORF143 shows 95.5% identity over a llOaa overlap with a predicted ORF (ORF143ng) from 
N. gonorrhoeae: 



orf 1 4 3 . pep MRTKWSAVRSCTWADTADIDTALNLLYRLQKLE FLYGDENGHSDGINLXDEQLPLLMEQL 60 

orf 14 3ng MRTKWSAVRSCSRADTADIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQL 60 

orf 143. pep SGSGKALLVDRNGLYLANANFHHEAAEELGLLAAEVAQMEKKYRLLIKNN 110 

I I II I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I : I I 
orf 143ng SGSGKALLVDRNGLYLANANFHHE3AEELGLLAAEVAQMEKKYRLLIRNNLYINNNAWGV 120 

An ORF143ng nucleotide sequence <SEQ ID 615> was predicted to encode a protein having amino 



acid sequence <SEQ ID 61 6>: 
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1 MRTKWSAVRS CSRADTADID TALNLLYRLQ KLEFLYGDEN GHSDGINLSD 

51 EQLPLLMEQL SGSGKALLVD RNGLYLANAN FHHESAEELG LLAAEVAQME 

101 KKYRLLIRNN LYINNNAWGV CDPSGQSELT F FPLYIGSTK FILVIAGI PD 

151 LSKGGICYFG KDFIPPLQQP RVKLGTGGIM RQLLISILED LNNTSTDIIA 

201 SAVISTDGLP MATMLPSHLN SDRVGA1SAT LLALGSRSVQ ELACGELEQV 

251 MIKGKSGYIL LSQAGKDAVL VLVAKETG RL GLILLDAKRA ARHIA EAI* 



Further work revealed the following gonococcal DNA sequence <SEQ ID 617>: 



1 AT GGAAT CAA CACTTTCACT ACAAGCGAAT TTATATCCCT GCCTGACTCC 

51 TGCCGGTGCA TTTTATGCCG TATCCAGCGA TGCCCCCAGT GCCGGTAAAA 

101 CTTTGTTGCG CAGCCTGTTG AAAGCGGATG CGGACGAAGT GGTCAGCAGT 

151 GAGAAGCTGC TCGCGGCGGA CACCGCCGAC ATCGATACCG CTTTGAACCT 

201 GTTGTACCGT TTGCAAAAAC T GGAAT TCCT CTATGGCGAT GAAAACGGTC 

251 ATTCAGACGG CATCAATTTG TCGGACGAGC AATTGCCGTT GCTGATGGAA 

301 CAATTGTCCG GCAGCGGTAA GGCATTATTG GTCGATCGGA ACGGTCTGTA 

351 TCTTGCCAAC GCCAATTTCC ATCATGAGTC GGCGGAAGAG TTGGGGTTGT 

401 TGGCGGCAGA AGTCGCACAG ATGGAAAAGA AATACCGGCT GCTGATTAGG 

451 AACAACCTGT AT AT CAACAA TAACGCTTGG GGCGTTTGCG ATCCTTCCGG 

501 TCAGAGCGAA TTGACATTTT TCCCATTGTA TATCGGTTCA ACCAAATTTA 

551 TTTTGGTTAT CGCCGGCATT CCCGATTTGA GCAAAGAGGC ATTTGTTACT 

601 TTGGTAAGGA TTTTATACCG CCGTTACAGC AACCGCGTGT AA 

This corresponds to the amino acid sequence <SEQ ID 618; ORF143ng-l>: 

1 MESTLSLQAN LYPCLTPAGA FYAVSSDAPS AGKTLLRSLL KADADEWSS 

51 EKLLAADTAD IDTALNLLYR LQKLEFLYGD ENGHSDGINL SDEQLPLLME 

101 QLSGSGKALL VDRNGLYLAN ANFHHESAEE LGLLAAEVAQ MEKKYRLLIR 

151 NNLYINNNAW GVCDP5GQSE LT FFPLYIGS TKFILVIAGI PDLSKEAFVT 

201 LVRILYRRYS NRV* " 

ORF143ng-l and ORF143-1 show 95.8% identity in 214 aa overlap: 



orf 14 3ng-l .pep MESTLSLQANLYPCLTPAGAFYAVSSDAPSAGKTLLRSLLKADADEVVSSEKLLA-ADTA 59 

orf 143-1 MESTLSLQANLYPRLTPAGAFYAVSSDAPSAGKTLLHSLLKADADEMVSSEKLLTWADTA 60 

orf 143ng-l .pep DIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 119 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 

orfl43-l DIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 12 0 

orfl43ng-l.pep NANFHHESAEELGLLAAEVAQMEKKYRLLIRNNLYINNNAWGVCDPSGQSELTFFPLYIG 17 9 

I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I II I I I I I I I I I I I I I I I I I I 
orf 14 3-1 NANFHHEAAEELGLLAAEVAQMEKKYRLLIKNNLYINNNAWGVCDPSGQSELTFFPLYIG 18 0 

orfl43ng-l.pep STKFILVIAGIPDLSKEAFVTLVRILYRRYSNRV 213 

orfl43-l STKFILVIGGIPDLGKEAFVTLVRILYRRYSNRV 214 

Based on the presence of the putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 74 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 619>: 

1 ATGACCTTTT TACAACGTTT GCAAGGTTTG GCAGACAATA AAATCTGTGC 

51 GTTTGCATGG TTCGTCGTCC GCCGCTTTGA TGAAGAACGC GTACCGCAGr 

101 CGGCGGCAAG CATGACGTTT ACGACGCTGC TGGCACTCGT CCCCGTGCTG 

151 ACCGTGATGG TGGCGGTCGC TTCGATTTTC CCCGTGTTCG ACCGCTGGTC 

201 GGATTCGTTC GTCTCCTTCG TCAACCAAAC CATTGTGCCG CA.GGCGCGG 

251 ACATGGTGTT CGACTATATC AATGCGTTCC GCGAGCAGGC GAACCGGCTG 

301 ACGGCAATCG GCAGCGTGAT GCTGGTCGTT ACCTCGCTGA TGCTGATTCG 

351 GACGATAGAC AATACGTTCA ACCGCATCTG G^CGGGTCAA wTyCCAGCGT 

401 CCGTGGATG. . 
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This corresponds to the amino acid sequence <SEQ ID 620; ORF144>: 

1 MTFLQRLQGL ADNKICAFAW FWRRFDEER VPQXAASMTF TTLLALVPVL 
51 TVMVAVASIF PVFDRWSDSF VSFVNQTIVP XGADMVFDYI NAFREQANRL 
101 TAIGSVMLVV TSLMLIRTID NTFNRIWRVX XQRPWM. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 62 1>: 



1 ATGACCTTTT TACAACGTTT GCAAGGTTTG GCAGACAATA AAATCTGTGC 

51 GTTTGCATGG TTCGTCGTCC GCCGCTTTGA TGAAGAACGC GTACCGCAGG 

101 CGGCGGCAAG CATGACGTTT ACGACGCTGC TGGCACTCGT CCCCGTGCTG 

151 ACCGTGATGG TGGCGGTCGC TTCGATTTTC CCCGTGTTCG ACCGCTGGTC 

201 GGATTCGTTC GTCTCCTTCG TCAACCAAAC CATTGTGCCG CAGGGCGCGG 

251 ACATGGTGTT CGACTATATC AATGCGTTCC GCGAGCAGGC GAACCGGCTG 

301 ACGGCAATCG GCAGCGTGAT GCTGGTCGTT ACCTCGCTGA TGCTGATTCG 

351 GACGATAGAC AATACGTTCA ACCGCATCTG GCGGGTCAAT TCCCAGCGTC 

401 CGTGGATGAT GCAGTTTCTC GTCTATTGGG CTTTACTGAC GTTCGGGCCG 

451 CTGTCTTTGG GCGTGGGCAT TTCCTTTATG GTCGGCTCGG TACAGGATGC 

501 CGCGCTTGCC TCAGGTGCGC CGCAGTGGTC GGGCGCGTTG CGAACGGCGG 

551 CGACGCTGAC CTTCATGACG CTTTTGCTGT GGGGGCTGTA CCGCTTCGTG 

601 CCAAACCGCT TCGTTCCCGC GCGGCAGGCG TTTGTCGGGG CTTTGGCAAC 

651 AGCGTTTTGT CTGGAAACCG CGCGCTCCCT CTTCACTTGG TATATGGGCA 

701 ATTTCGACGG CTACCGCTCG ATTTACGGCG CGTTTGCCGC CGTGCCGTTT 

751 TTTCTGTTGT GGCTGAACCT GTTGTGGACG CTGGTCTTGG GCGGCGCGGT 

801 GCTGACTTCT TCACTCTCCT ACTGGCAGGG AGAAGCGTTC CGCAGGGGCT 

651 TCGACTCGCG CGGACGGTTT GACGACGTGT TGAAAATCCT GCTGCTTCTG 

901 GATGCGGCGC AAAAAGAAGG CAAAGCCTTG CCTGTTCAGG AGTTCAGACG 

951 GCATATCAAT ATGGGCTACG ACGAGTTGGG CGAGCTTTTG GAAAAGCTGG 

1001 CGCGGCACGG CTACATCTAT TCCGGCAGAC AGGGTTGGGT GTTGAAAACG 

1051 GGGGCGGATT CGATTGAGTT GAACGAACTC TTCAAGCTCT TCGTTTACCG 

1101 TCCGTTGCCT GTGGAAAGGG ATCATGTGAA CCAAGCTGTC GATGCGGTAA 

1151 TGACACCGTG TTTGCAGACT TTGAACATGA CGCTGGCAGA GTTTGACGCT 

1201 CAGGCGAAAA AACGGCAGTA G 

This corresponds to the amino acid sequence <SEQ ID 622; ORF144-l>: 



1 MTFLQRLQGL ADNKICAFA W FWRRFDEER VPQAAASMTF TT LLALVPVL 

51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI NAFREQANRL 

101 TAIGSVMLVV TSLMLI RTID NTFNRIWRVN SQRPWMMQFL VYWA LLT FGP 

151 LSLGVGISFM V GSVQDAALA SGAPQWSGAL RTAATLTFMT LLLWGLYRFV 

201 PNRFVPARQA FVGALATAFC LETARSLFTW YMGNFDGYRS IYGAFAAVPF 

251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRGFD3RGRF DDVLKILLLL 

301 DAAQKEGKAL PVQE FRRHIN MGYDELGELL EKLARHGYIY SGRQGWVLKT 

351 GADSIELNEL FKLFVYRPLP VERDHVNQAV DAVMTPCLQT LNMTLAEFDA 

401 QAKKRQ* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from 7V. meningitidis (strain A) 

ORF144 shows 96.3% identity over a 136aa overlap with an ORF (ORF144a) from strain A of N. 
meningitidis: 



orf 144 .pep MTFLQRLQGLADNKICAFA WFWRRFDEERVPQXAASMTFTT LLALVPVLTVMVAVASI F 
orfl44a MTFLQRLQGLADNKICAFA WFWRRFDEERVPQAAASMTFTT LLALVPVLTVMVAVASI F 



orf 144 .pep PVFDRWSDSFVSFVNQTIVPXGADMVFDYINAFREQANR LTAIGSVMLWTSLML IRTID 



orf 144 .pep 
orf 144a 



NTFNRIWRVXXQRPWM 

NTFNRIWRVNSQRPWMMQFLVYW ALLTFGPLSLGVGISFXV GSVQDAALASGAPQWSGAL 
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130 140 150 160 170 

The complete length ORF144a nucleotide sequence <SEQ ID 623> is: 



1051 
1101 
1151 
1201 



ATGACCTTTT 
GTTTGCATGG 
CGGCGGCAAG 
ACCGTGATGG 
GGATTCGTTC 
ACATGGTNTT 
ACGGCAATCG 
GACGATAGAC 
CGTGGATGAT 
CTGTCTTTGG 
CGCGCTTGCC 
CGACGCTGAN 
CCAAACCGCT 
AGCGTTCTGT 
ATTTCGACGG 
TTTCTGTTGT 
GCTGACTTCT 
TCGACTCGCG 
GATGCGGCGC 
GCATATCAAT 
CGCGGCACGG 
GGGGCGGATT 
TCCGTTGCCT 
TGATGCCGTG 
CAGGCGAAAA 



TACAACGTTT 
TTCGTCGTCC 
CATGACGTTT 
TGGCGGTCGC 
GTCTCCTTCG 
CGACTATATC 
GCAGCGTGAT 
AATACGTTCA 
GCAGTTTCTC 
GCGTGGGCAT 
TCAGGTGCGC 
CTTCATGACG 
TCGTTCCCGC 
CTGGAAACCG 
CTACCGCTCG 
GGCTGAACCT 
TCACTCTCCT 
CGGACGGTTT 
AAAAAGAAGG 
ATGGGCTACG 



CGATTGAGTT 
GTGGAAAGGG 
TTTGCAGACT 
AACAGCAGCA 



GCAAGGTTTG 
GCCGCTTTGA 
ACGACACTGC 
TTCGATTTTC 
TCAACCAAAC 
AATGCGTTCC 
GCTGGTCGTT 
ACCGCATCTG 
GTCTATTGGG 
TTCCTTTATN 
CGCAGTGGTC 
CTTTTGCTGT 
GCGGCANGCG 
CGCGTTCCCT 
ATTTACGGNG 
GTTGTGGACG 
ACTGGCAGGG 
GACGACGTGT 
CNAAGCCTTG 
ACGAGTTGGG 
TCCGGCAGAC 
GAACGAACTC 
ATCAT3TGAA 
TTGAACATGA 
ATCTTGA 



GCAGACAATA 
TGAAGAACGC 
TGGCACTCGT 
CCCGTGTTCG 
CATTGTGCCG 
GCGAGCAGGC 
ACCTCGCNGA 
GCGGGTCAAT 
CTTTACTGAC 
GTCGGCTCGG 
GGGCGCGTTG 
GGGGGCTGTA 
TTTGTCGGGG 
CTTTACTTGG 
CGTTTGCCGC 
CTGGTCTTGG 
AGAAGCGTTC 
TGAAAATCCT 
CCTGTTCAGG 
CGAGCTTTTG 
AGGGTTGGGT 
TTCAAGCTCT 
CCAAGCTGTC 
CGCTGGCAGA 



AAATCTGTGC 
GTACCGCAGG 
CCCCGTGCTG 
ACCGNTGGTC 
CAGGGCGCGG 
GAACCGGCTG 
TGCTGATTCG 
TCCCAGCGTC 
GTTCGGGCCG 
TACAGGATGC 
CGAACGGCGG 
CCGCTNCGTG 
CTTTGGCAAC 
TATATGGGCA 
CGTGCCGTTT 
GCGGCGCGGT 
CGCAGGGNCT 
GCTGCTTCTG 
AGTTCAGACG 
GAAAAGCTGG 
GTTGAAAACG 
TCGTTTACCG 
GATGCGGTAA 
GTTTGACGCT 



This encodes a protein having amino acid sequence <SEQ ID 624>: 

1 MTFLQRLQGL ADNKICAFA W FVVRRFDEER VPQAAASMTF 

51 TV MVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI 

101 TAIGSVMLVV TSXMLI RTTD NTFNRIWRVN SQRPWMMQFL 

151 LSLGVGISFX V GSVQDAALA SGAPQWSGAL RTAATLXFMT 

201 PNRFVPARXA FVGALATAFC LETARSLFTW YMGNFDGYRS 

251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRXFDSRGRF 

30 ] DAAQKEGXAL PVQEFRRHIN MGYDELGELL EKLARHGYIY 

351 GADSIELNEL FKLFVYRPLP VERDHVNQAV DAVMMPCLQT 

401 QAKKQQQS* 

ORF144a and ORF144-1 show 97.8% identity in 406 aa overlap: 



TT LLALVPVL 
NAFREQANRL 
VYW ALLTFGP 
LLLWGLYRXV 
XYGAF AAVPF 
DDVLKILLLL 
SGRQGWVLKT 
LNMTLAEFDA 



orfl44a . pep MTFLQRLQGLADNKICAFAWFWRRFDEERVPQAAASMTFTTLLALVPVLTVMVAVASIF 

orf 14 4-1 MTFLQRLQGLADNKICAFAWFWRRFDEERVPQAAASMT FTTLLALVPVLTVTWAVASIF 

orf 144a . pep PVFDRWSDSFVSFVNQTIVPQGADMVFDYINAFREQANRLTAIGSVMLWTSXMLIRTID 

orf 14 4-1 PVFDRWSDSFVSFTOQTIVPQGADMVFDYINAFREQANRLTAIGSVMLVVTSLMLIRTID 

orf 14 4a. pep NTFNRIWRVNSQRPWMMQFLVYWALLTFGPLSLGVGISFXVGSVQDAALASGAPQWSGAL 

orf 14 4-1 NTFNRIWRVNSQRPWMMQFLVYWALLTFGPLSLGVGISFWGSVQDAALASGAPQWSGAL 

orf 144a . pep RTAATLXFMTLLLWGLYRXVPNRFVPARXAFVGALATAFCLETARSLFTWYMGNFDGYRS 

orf 14 4-1 RTAATLT FMTLLLWGLYRFVPNRFVPARQAFVGALATAFCLETARSLFTWYMGNFDGYRS 

orf 14 4a . pep IYGAFAAVPFFLLWLNLLWTLVLGGAVLTSSLSYWQGEAFRRXFDSRGRFDDVLKILLLL 

orf 14 4-1 IYGAFAAVPFFLLWLNLLWTLVLGGAVLTSSLSYWQGEAFRRGFDSRGRFDDVLKILLLL 

orf 144a . pep DAAQKEGXALPVQEFRRHINMGYDELGELLEKLARHGYIYSGRQGWVLKT GADSIELNEL 

orf 14 4-1 DAAQKEGKALPVQEFRRHINMGYDELGELLEKIoARHGYIYSGRQGWVLKTGADSIELNEL 

orfl44a.pep FKLFVYRPLPVERDHVNQAVDAVMMPCLQTLNMTLAEFDAQAKKQQQS 408 

orfl44-l FKLFVYRPLPVERDHVNQAVDAVMTPCLQTLNMTLAEFDAQAKKRQ 406 
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Homology with a predicted ORF from N. gonorrhoeae 

ORF144 shows 91.2% identity over a 136aa overlap with a predicted ORF (ORF144ng) from 
N. gonorrhoeae: 

orf 144 .pep MT FLQRLQGLADNKI CAFAWFWRRFDEERVPQXAASMT FTT LLALVPVLTVMVAVAS I F 60 

orf 14 4ng MTFLQCWQGSADNKICAFAWFVIRRFSEERVPQAAASMTFTTLLALVPVLTVMVAVASIF 60 

orf 144 .pep PVFDRWSDSFVSFVNQTIVPXGADMVFDYINAFREQANRLTAIGSVMLWTSLMLIRTID 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I : I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 14 4ng PVFDRWSDSFVSFVNQTIVPCGADMVFDYIDAFRDQANRLTAIGSVMLVVTSLMLIRTID 12 0 

orf 144. pep NTFNRIWRVXXQRPWM 13 6 

orfl4 4ng NAFNRIWRVNTQRPWMMQFLVYWALLTFGPLSLGVGISFMVGSVQDSVLSSGAQQWADAL 180 

The complete length ORF144ng nucleotide sequence <SEQ ID 625> is predicted to encode a 
protein having amino acid sequence <SEQ ID 626>: 



1 MTFLQCWQGS ADNKICAFAW FVIRRFSEER VPQAAASMTF TT LLALVPVL 

51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI DAFRDQANRL 

101 TAIGSVMLW TSLMLI RTID NAFNRIWRVN TQRPWMMQFL VYWA LLTFGP 

151 LSLGVGISFM V GSVQDSVLS SGAQQWADAL KTAARLAFMT LLLWGLYRFV 

201 PNRFVPARQA FVGALITAFC LETARFLFTW YMGNFDGYRS IYGAFA AVPF 

251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRGFDSRGRF DDVLKILLLL 

301 DAAQKEGRTL SVQEFRRHIN MGYDELGELL EKLARYGYIY SGRQGWVLKT 

351 GADSIELSEL FKLFVYRPLP VERDHVNQAV DAVMTPCLQT LNMTLAEFDA 

401 QAKKQQQS* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 627>: 



1 ATGACCTTTT TACAACGTTG GCAAGGTTTG GCGGACAATA AAATCTGTGC 

51 ATTTGCATGG TTCGTCATCC GCCGTTTCAG TGAAGAGCGC GTACCGCAGG 

101 CAGCGGCGAG CATGACGTTT ACGACACTGC TGGCACTCGT CCCCGTACTG 

151 ACCGTAATGG TCGCGGTCGC TTCGATTTTC CCCGTGTTCG ACCGCTGGTC 

201 GGATTCGTTC GTCTCCTTCG TCAACCAAAC CATTGTGCCG CAGGGCGCGG 

251 ATATGGTGTT CGACTATATC GACGCATTCC GCGATCAGGC AAACCGGCTG 

301 ACCGCCATCG GCAGCGTGAT GCTGGTCGTA ACCTCGCTGA TGCTGATTCG 

351 GACGATAGAC AATGCGTTCA ACCGCATCTG GCGGGTTAAC ACGCAACGCC 

401 CCTGGATGAT GCAGTTCCTC GTTTATTGGG CGTTGCTGAC TTTCGGGCCT 

4 51 TTGTCTTTGG GTGTGGGCAT TTCCTTTATG GTCGGGTCGG TTCAAGACTC 

501 CGTACTCTCC TCCGGAGCGC AACAATGGGC GGACGCGTTG AAGACGGCGG 

551 CAAGGCTGGC TTTCATGACG CTTTTGCTGT GGGGGCTGTA CCGCTTCGTG 

601 CCCAACCGCT TCGTGCCCGC CCGGCAGGCG TTTGTCGGAG CTTTGATTAC 

651 GGCATTCTGC CTGGAGACGG CACGTTTCCT GTTCACCTGG TATATGGGCA 

701 ATTTCGACGG CTACCGCTCG ATTTACGGCG CATTTGCCGC CGTGCCGTTT 

7 51 TTCCTGCTGT GGTTAAACCT GCTGTGGACG CTGGTCTTGG GCGGGGCGGT 

801 GCTGACTTCG TCGCTGTCTT ATTGGCAGGG CGAGGCCTTC CGCAGGGGAT 

851 TCGACTCGCG CGGACGGTTT GACGACGTGT TGAAAATCCT GCTGCTTCTG 

901 GATGCGGCGC AAAAAGAAGG CCGAACCCTG TCCGTTCAGG AGTTCAGACG 

951 GCATATCAAT ATGGGTTACG ATGAATTGGG CGAGCTTTTG GAAAAGCTGG 

1001 CGCGGTACGG CTATATCTAT TCCGGCAGAC AGGGCTGGGT TTTGAAAACG 

1051 GGGGCGGATT CGATTGAGTT GAGCGAACTC TTCAAGCTCT TCGTGTACCG 

1101 CCCGTTGCct gtggaAAGGG ATCATGTGAA CCAAGCTGtc gaTGCGGTAA 

1151 TGAcgccgtG TTTGCAGACT TTGAACATGA CGCTGGCGGA GTTTGACGCT 

1201 CAGgcgAAAA AACAGCAGCA GTCTTGA 

This encodes a variant of ORF144ng, having the amino acid sequence <SEQ ID 628; ORF144ng-l>: 

1 MTFLQRWQGL ADNKICAFA W FVIRRFSEER VPQAAASMTF TT LLALVPVL 

51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI DAFRDQANRL 

101 TAIGSVMLW TSLMLI RTID NAFNRIWRVN TQRPWMMQFL VYW ALLTFGP 

151 LSLGVGISFM V GSVQDSVLS SGAQQWADAL KTAARLAFMT LLLWGLYRFV 

201 PNRFVPARQA FVGALITAFC LETARFLFTW YMGNFDGYRS IYGAFA AVPF 

251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRGFDSRGRF DDVLKILLLL 

301 DAAQKEGRTL SVQEFRRHIN MGYDELGELL EKLARYGYIY SGRQGWVLKT 
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351 GADSIELSEL FKLFVYRPLP VERDHVNQAV DAVMTPCLQT LNMTLAEFDA 
4 01 QAKKQQQS* 

ORF144ng-l and ORF 144-1 show 94.1% identity in 406 aa overlap: 

orfl44ng-l.pep MTFLQRWQGLADNKICAFAWFVIRRFSEERVPQAAASMTFTTLLALVPVLTVMVAVASIF 
5 I I I I I I I I I I I I I I I I I I I I I : I I I : I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 14 4-1 MTFLQRLQGLADNKICAFAWFWRRFDEERVPQAAASMTFTTLLALVPVLTVMVAVASIF 

orf 14 4ng-l.pep PVFDRWSDSFVSFVNQTIVPQGADMVFDYIDAFRDQANRLTAIGSVMLWTSLMLIRTID 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : i I I : I I I I I I I I I I I I I I I I I I I I I I I I I 
10 orf 144-1 PVFDRWSDSFVSFVNQTIVPQGADMVFDYINAFREQANRLTAIGSVMLWT3LMLIRTID 

orfl4 4ng-l.pep NAFNRIWRVNTQRPWMMQFLVYWALLTFGPLSLGVGISFMVGSVQDSVLSSGAQQWADAL 

orf 14 4-1 NTFNRIWRVNSQRPWMMQFLVYWALLTFGPLSLGVGISFMVGSVQDAALASGAPQWSGAL 

15 

orfl4 4ng-l.pep KTAARLAFMTLLLWGLYRFVPNRFVPARQAFVGALITAFCLETARFLFTWYMGNFDGYRS 

orf 14 4-1 RTAATLTFMTLLLWGLYRFVPNRFVPARQAFVGALATAFCLETARSLFTWYMGNFDGYRS 

20 orf 14 4ng-l .pep IYGAFAAVPFFLLWLNLIWTLVLG3AVLTSSLSYWQGEAFRRGFDSRGRFDDVLKILLLL 

orf 14 4-1 I YGAFAAVPFFLLWLNLLWTLVLGGAVLTSSLSYWQGEAFRRGFDSRGRFDDVLKILLLL 

orfl4 4ng-l.pep DAAQKEGRTLSVQEFRRHINMGYDELGELLEKLARYGYIYSGRQGWVLKT GADSIELSEL 
25 I I I I I I I :: I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I i I I I I I I : I I 

orf 14 4-1 DAAQKEGKALPVQEFRRHINMGYDELGELLEKLARHGYIYSGRQGWVLKTGADSIELNEL 

orf 144ng-l . pep FKLFVYRPLPVERDHVNQAVDAVMIPCLQTLNMTLAEFDAQAKKQQQS 

30 orf 14 4-1 FKLFVYRPLPVERDHVNQAVDAVMTPCLQTLNMTLAEFDAQAKKRQ 

On this basis of this analysis, including the identification of several putative transmembrane 
domains in the gonococcal protein, it is predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

35 Example 75 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 629>: 

1 . .AGACACGCCC GCCGCATCCG CATCGACACC GCCATCAACC CCGAACTGGA 

51 AGCCCTCGCC GAACACCTCC ACTACCAATG GCAGGGCTTC CTCTGGCTCA 

101 GCACCGATAT GCGTCAGGAA ATTTCCGCCC TCGTCATCCT GCTGCAACGC 

40 151 ACCCGCCGCA AATGGCTGGA TGCCCACGAA CGCCAACACC TGCGCCAAAG 

201 CCTGCTTGAA ACACGGGAAC ACGGCTGA 

This corresponds to the amino acid sequence <SEQ ID 630; ORF146>: 



45 Further work revealed the complete nucleotide sequence <SEQ ID 63 1>: 

1 ATGAACACCT CGCAACGCAA CCGCCTCGTC AGCCGCTGGC TCAACTCCTA 

51 CGAACGCTAC CGCTACCGCC GCCTCATCCA CGCCGTCCGG CTCGGCGGGG 

101 CCGTCCTGTT CGCCACCGCC TCCGCCCGGC TGCTCCACCT CCAACACGGC 

151 GAGTGGATAG GGATGACCGT CTTCGTCGTC CTCGGCATGC TCCAGTTTCA 

50 201 AGGGGCGATT TACTCCAAGG CGGTGGAACG TATGCTCGGC ACGGTCATCG 

251 GGCTGGGCGC GGGTTTGGGC GTTTTATGGC TGAACCAGCA TTATTTCCAC 

301 GGCAACCTCC TCTTCTACCT CACCGTCGGC ACGGCAAGCG CACTGGCCGG 

351 CTGGGCGGCG GTCGGCAAAA ACGGCTACGT CCCTATGCTG GCAGGGCTGA 

401 CGATGTGTAT GCTCATCGGC GACAACGGCA GCGAATGGCT CGACAGCGGA 

55 451 CTCATGCGCG CCATGAACGT CCTCATCGGC GCGGCCATCG CCATCGCCGC 
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501 CGCCAAACTG CTGCCGCTGA AATCCACACT GATGTGGCGT TTCATGCTTG 

551 CCGACAACCT GGCCGACTGC AGCAAAATGA TTGCCGAAAT CAGCAACGGC 

601 AGGCGCATGA CCCGCGAACG CCTCGAGGAG AACATGGCGA AAATGCGCCA 

651 AATCAACGCA CGCATGGTCA AAAGCCGCAG CCATCTCGCC GCCACATCGG 

701 GCGAAAGCCG CATCAGCCCC GCCATGATGG AAGCCATGCA GCACGCCCAC 

7 51 CGTAAAATCG TGAACACCAC CGAGCTGCTC CTGACCACCG CCGCCAAGCT 

801 GCAATCTCCC AAACTCAACG GCAGCGAAAT CCGGCTGCTT GACCGCCACT 

851 TCACACTGCT CCAAACCGAC CTGCAACAAA CCGTCGCCCT TATCAACGGC 

901 AGACACGCCC GCCGCATCCG CATCGACACC GCCATCAACC CCGAACTGGA 

951 AGCCCTCGCC GAACACCTCC ACTACCAATG GCAGGGCTTC CTCTGGCTCA 

1001 GCACCAATAT GCGTCAGGAA ATTTCCGCCC TCGTCATCCT GCTGCAACGC 

1051 ACCCGCCGCA AATGGCTGGA TGCCCACGAA CGCCAACACC TGCGCCAAAG 

1101 CCTGCTTGAA ACACGGGAAC ACGGCTGA 

This corresponds to the amino acid sequence <SEQ ID 632; ORF146-l>: 

1 MNTSQRNRLV SRWLNSYERY RYRRLIHAVR LGGAVLFATA SARLLHLQHG 

51 EW IGMTVFW LGMLQFQGA I YSKAVER MLG TVIGLGAGLG VLWL NQHYFH 

101 GNLLFYLTVG TASALAGWAA VGKNGYVPML AGLTMCMLIG DNGSEWLDSG 

151 LMRAMN VLIG AAIAIAAAKL LPL KSTLMWR FMLADNLADC SKMIAEISNG 

201 RRMTRERLEE NMAKMRQINA RMVKSRSHLA ATSGESRISP AMMEAMQHAH 

251 RKIVNTTELL LTTAAKLQSP KLNGSEIRLL DRHFTLLQTD LQQTVALING 

301 RHARRIRIDT AINPELEALA EHLHYQWQGF LWLSTNMRQE ISALVILLQR 

351 TRRKWLDAHE RQHLRQSLLE TREHG* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF146 shows 98.6% identity over a 74aa overlap with an ORF (ORF146a) from strain A of//. 
meningitidis: 

10 20 30 

orf 14 6. pep RHARRIRIDTAINPELEALAEHLHYQWQGF 

I I I I I I I I I I I I I I I I I I I I I I I I I 

orfl4 6a KLNGSEIRLLDRHFTLLQTDLQQTVALIKGRHARRIRIDTAINPELEALAEHLHYQWQGF 
280 290 300 310 320 330 

40 50 60 70 

orf 14 6. pep LWLSTDMRQEISALVILLQRTRRKWLDAHERQHLRQSLLETREHGX 

I I I I I : I I 1 M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : 
orf 14 6a LWLSTNMRQEISALVILLQRTRRKWLDAHERQHLRQSLLETREHSX 
340 350 360 370 

The complete length ORF 146a nucleotide sequence <SEQ ID 63 3> is: 

1 ATGAACACCT CGCAACGCAA CCGCCTCGTC AGCCGCTGGC TCAACTCCTA 

51 CGAACGCTAC CGCTACCGCC GCCTCATCCA CGCCGTCCGG CTCGGCGGGG 

101 CCGTCCTGTT CGCCACCGCC TCCGCCCGGC TGCTCCACCT CCAACACGGC 

151 GAGTGGATAG GGATGACCGT CTTCGTCGTC CTCGGCATGC TCCAGTTTCA 

201 AGGGGCGATT TACTCCAAGG CGGTGGAACG TATGCTCGGC ACGGTCATCG 

251 GGCTGGGCGC GGGTTTGGGC GTTTTATGGC TGAACCAGCA TTATTTCCAC 

301 GGCAACCTCC TCTTCTACCT CACCGTCGGC ACGGCAAGCG CACTGGCCGG 

351 CTGGGCGGCG GTCGGCAAAA ACGGCTACGT CCCTATGCTG GCGGGGCTGA 

401 CGATGTGCAT GCTCATCGGC GACAACGGCA GCGAATGGTT CGACAGCGGC 

451 CTGATGCGCG CGATGAACGT CCTCATCGGC GCGGCCATCG CCATCGCCGC 

501 CGCCAAACTG CTGCCGCTGA AATCCACACT GATGTGGCGT TTCATGCTTG 

551 CCGACAACCT GACCGACTGC AGCAAAATGA TTGCCGAAAT CAGCAACGGC 

601 AGGCGCATGA CCCGCGAACG CCTCGAAGAG AACATGGCGA AAATGCGCCA 

651 AATCAACGCA CGCATGGTCA AAAGCCGCAG CCACCTCGCC GCCACATCGG 

701 GCGAAAGCCG CATCAGCCCC GCCATGATGG AAGCCATGCA GCACGCCCAC 

751 CGTAAAATTG TCAACACCAC CGAGCTGCTC CTGACCACCG CCGCCAAGCT 

801 GCAATCTCCC AAACTCAACG GCAGCGAAAT CCGGCTGCTT GACCGCCACT 

851 TCACACTGCT CCAAACCGAC CTGCAACAAA CCGTCGCCCT TATCAACGGC 

901 AGACACGCCC GCCGCATCCG CATCGACACC GCCATCAACC CCGAACTGGA 

951 AGCCCTCGCC GAACACCTCC ACTACCAATG GCAGGGCTTC CTCTGGCTCA 

1001 GCACCAATAT GCGTCAGGAA ATTTCCGCCC TCGTCATCCT GCTGCAACGC 

1051 ACCCGCCGCA AATGGCTGGA TGCCCACGAA CGCCAACACC TGCGCCAAAG 

1101 CCTGCTTGAA ACACGGGAAC ACAGTTGA 
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This encodes a protein having amino acid sequence <SEQ ID 634>: 



MNTSQRNRLV SRWLNSYERY RYRR1IKAVR 
EW IGMTVFVV LGMLQFQGA I YSKAVE RMLG 
GNLLFYLTVG TASALAGWAA VGKNGYVPML 
LMRAMN VLIG AAIAIAAAKL LPL KSTLMWR 
RRMTRERLEE NMAKMRQINA RMVKSRSHLA 
RKIVNTTELL LTTAAKLQSP KLNGSEIRLL 
RHARRIRIDT AINPELEALA EHLHYQWQGF 
TRRKWLDAHE RQHLRQSLLE TREHS * 



AGLTMCMLIG 
FMLADNLTDC 
ATSGESRISP 
DRHFTLLQTD 
LWLSTNMRQE 



SARLLHLQHG 
VLWLNQHYFH 
DNGSEWFDSG 
5KMIAEI5NG 
AMMEAMQHAH 
LQQTVALING 
I3ALVILLQR 



10 ORF146a and ORF146-1 show 99.5% identity in 374 aa overlap: 



orf 146a .pep 
orfl46-l 
orf 146a . pep 
orfl46-l 
orf 14 6a. pep 
orf 14 6-1 
orf 146a. pep 
orfl46-l 
orf 14 6a. pep 
orfl46-l 
orf 14 6a . pep 
orfl46-l 
orf 14 6a. pep 
orfl46-l 



MNTSQRNRLVSRWLNSYERYRYRRLIHAVRLGGAVLFATASARLLHLQHGEWIGMTVFW 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

MNTSQRNRLVSRWLNSYERYRYRRLIHAVRLGGAVLFATASARLLHLQHGEWIGMTVFW 

LGMLQFQGAI YSKAVERMLGTVIGLGAGLGVLWLNQHYFHGNLLFYLTVGTASALAGWAA 

II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 

LGMLQFQGAI YSKAVERMLGTVIGLGAGLGVLWLNQHYFHGNLLFYLTVGTASALAGWAA 

VGKNGYVPMLAGLTMCMLIGDNGSEWFDSGLMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 
I I I I I I I I I I ! I I I I I I I M I \ I II I : I I II I I I I I I I I I II I I I II I I I I I I I I II I I I 
VGKNGYVPMLAGLTMCMLIGDNGSEWLDSGLMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 

FMLADNLTDCSKMIAEISNGRRMTRERLEENMAKMRQINARMVKSRSHLAATSGESRISP 

FMLADNLADCSKMIAEISNGRRMTRERLEENMAKMRQINARMVKSRSHLAATSGESRISP 

AMMEAMQHAHRKIVNTTELLLTTAAKLQSPKLNGSEIRLLDRHFTLLQTDLQQTVALING 

I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I 

AMMEAMQHAHRKIVNTTELLLTTAAKLQSPKLNGSEIRLLDRHFTLLQTDLQQTVALING 

RHARRIRI DTAINPELEALAEHLHYQWQGFLWLSTNMRQE I SALVI LLQRTRRKWLDAHE 
I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I 
RHARRIRI DTAINPELEALAEHLHYQWQGFLWLSTNMRQE I SALVI LLQRTRRKWLDAHE 

RQHLRQSLLETREHSX 

RQHLRQSLLETREHGX 



Homology with a predicted ORF from ^gonorrhoeae 
40 ORF146 shows 97.3% identity over a 75aa overlap with a predicted ORF (ORF146ng) from 
N. gonorrhoeae: 

orf 14 6. pep RHARRIRI DTAINPELEALAEHLHYQWQGF 30 

orfl4 6ng KLNGSEIRLLDRHFTLLQTDLQQTAALINGRHARRIRI DTAINPELEALAEHLHYQWQGF 3 64 

45 

orf 14 6. pep LWLSTDMRQEISALVILLQRTRRKWLDAHERQHLRQSLLETREHG 7 5 
orfl46ng LWLSTNMRQEISALVIPLQRTRRKWLDAHERQHLRQSLLETREHG 409 

An ORF146ng nucleotide sequence <SEQ ID 635> was predicted to encode a protein having amino 
50 acid sequence <SEQ ID 636>: 

1 MSGVRFPSPA PIPSTDPPSG SLCFFTFPLQ TASDMWSSQR KRLSGRWLNS 

51 YERYRHRRLI HAVRLGGTVL FATALARLLH LQHGEW IGMT VFWLGMLQF 

101 QGAIYSNAVE RMLGTVIGLG AGLGVLWL NQ HYFKGNLLFY LTIGTASALA 

151 GWAAVGKNGY VPMLAGLTMC MLIGDNGSEW LDSGLMRAMN VLIGAAIAIA 

55 201 AAKLLPL KST LMWRFMLADN LADCSKMIAE ISNGRRMTRE RLE QNMVKMR 

251 QINARMVKSR SHLAATSGES RISPSMMEAM QHAHRKIVNT TELLLTTAAK 

301 LQSPKLNGSE IRLLDRHFTL LQTDLQQTAA LINGRHARRI RIDTAINPEL 

351 EALAEHLHYQ WQGFLWLSTN MRQEISALVI PLQRTRRKWL DAHERQHLRQ 

401 SLLETREHG* 

60 Further work revealed the following gonococcal DNA sequence <SEQ ID 63 7>: 
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1 ATGAACTCCT CGCAACGCAA ACGCCTTTCC GgccGCTGGC TCAACTCCTA 

51 CGAACGCTac cGCCaCcGCC GCCTCATACA TGCCGTGCGG CTCGGCggaa 

101 ccgtCCTGTT CGCCACCGCA CTCGCCCGgc tACTCCACCT CCAacacggc 

151 gAATGGATAG GGAtgaCCGT CTTCGTCGTC CTCGGCATGC TCCAGTTCCA 

201 AGGCgcgatt tActccaacg cggtgGAacg taTGctcggt acggtcatcg 

251 ggctgGGCGC GGGTTTGGgc gTTTTATGGC TGAACCAGCA TTAtttccac 

301 ggcaacCTcc tcttctacct gaccatcggc acggcaagcg cactggccgg 

351 ctGGGCGGCG GTCGGCAAAA acggctacgt ccctatgctg GCGGGGctgA 

401 CGATGTGCAT gctcatcggc gACAACGGCA GCGAATGGCT CGACAGCGGC 

451 CTGATGCGCG CGATGAACGT CCTCATCGGC GCCGCCATCG CCATTGCCGC 

501 CGCCAAACTG CTGCCGCTGA AATCCACACT GATGTGGCGT TTCATGCTTG 

551 CCGACAACCT GGCCGACTGC AGCAAAATGA TTGCCGAAAT CAGCAACGGC 

601 AGGCGTATGA CGCGCGAACG TTTGGAGCAG AATATGGTCA AAATGCGCCA 

651 AATCAACGCA CGCATGGTCA AAAGCCGCAG CCACCTCGCC GCCACATCGG 

701 GCGAAAGCCG CATCAGCCCC TCCATGATGG AAGCCATGCA GCACGCCCAC 

751 CGCAAAATCG TCAACACCAC CGAGCTGCTC CTGACCACCG CCGCCAAGCT 

801 GCAATCTCCC AAACTCAACG GCAGCGAAAT CCGGCTGCTC GACCGCCACT 

851 TCACACTGCT CCAAACCGAC CTGCAACAAA CCGCCGCCCT CATCAACGGC 

901 AGACACGCCC GCCGCATCCG CATCGACACC GCCATCAACC CCGAACTGGA 

951 AGCCCTCGCC GAACACCTCC ACTACCAATG GCAGGGCTTC CTCTGGCTCA 

1001 GCACCAATAT GCGTCAGGAA ATTTCCGCCC TCGTCATCCT GCTGCAACGC 

1051 ACCCGCCGCA AATGGCTGGA TGCCCACGA^ CGCCAACACC TGCGCCAAAG 

1101 CCTGCTTGAA ACACGGGAAC ACGGCTGA 

This corresponds to the amino acid sequence <SEQ ID 638; ORF146ng-l>: 



1 MN3S0RKRLS GRWLNSYERY RHRRLIHAVR LGGTVLFATA LARLLHLQHG 

51 EW IGMTVFVV LGMLQFQGA I YSNAVE RMLG TVIGLGAGLG VLWL NQHYFH 

101 GNLLFYLTIG TASALAGWAA VGKNGYVPML AGLTMCMLIG DNGSEWLDSG 

151 LMRAMN VLIG AAIAIAAAKL LPL KSTLMWR FMLADNLADC SKMIAEISNG 

201 RRMTRERLEQ NMVKMRQINA RMVKSRSHLA ATSGESRISP SMMEAMQHAH 

251 RKIVNTTELL LTTAAKLQSP KLNGSEIRLL DRHFTLLQTD LQQTAALING 

301 RHARRIRIDT AINPELEALA EHLHYQWQGF LWLSTNMRQE ISALVILLQR 

351 TRRKWLDAHE RQHLRQSLLE TREHG* 

ORF146ng-l and ORF146-1 show 96.5% identity in 375 aa overlap 

orf 14 6-1 . pep MNTSQRNRLVSRWLNSYERYRYRRLIHAVRLGGAVLFATASARLLHLQHGEWIGMTVFVV 

orfl4 6ng-l MNSSQRKRLSGRWLNSYERYRHRRLIHAVRLGGTVLFATALARLLHLQHGEWIGMTVFW 

Orf 14 6-1. pep LGMLQFQGAI YSKAVERMLGTVIGLGAGLGVLWLNQHYFHGNLLFYLTVGTASALAGWAA 

I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I : I I I I f I I I I I I 

orfl4 6ng-l LGMLQFQGAI YSNAVERMLGTVIGLGAGLGVLWLNQHYFHGNLLFYLTIGTASALAGWAA 

orf 14 6-1. pep VGKNGYVPMLAGLTMCMLIGDNGSEWLDSGLMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I ! I I I I I I 

orfl4 6ng-l VGKNGYVPMLAGLTMCMLIGDNGSEWLDSGLMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 

orf 14 6-1 . pep FMLADNLADCSKMIAEISNGRRMTRERLEENMAKMRQINARMVKSRSHLAATSGESRISP 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl4 6ng-l FMLADNLADCSKMIAEISNGRRMTRERLEQNMVKMRQINARMVKSRSHLAATSGESRISP 

orf 14 6-1. pep AMMEAMQHAHRKIVNTTELLLTTAAKLQSPKLNGSEIRLLDRHFTLLQTDLQQTVALING 

: I I I I I I I I : I I I I I 

orfl4 6ng-l SMMEAMQHAHRKIVNTTELLLTTAAKLQSPKLNGSEIRLLDRHFTLLQTDLQQTAALING 

orf 14 6-1. pep RHARRIRIDTAINPELEALAEHLHYQWQGFLWLSTNMRQEISALVILLQRTRRKWLDAHE 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl4 6ng-l RHARRIRIDTAINPELEALAEHLHYQWQGFLWLSTNMRQEISALVILLQRTRRKWLDAHE 

orf 14 6-1. pep RQHLRQSLLETREHGX 
I I I I I I I I I I I I I I I I 
orfl4 6ng-l RQHLRQSLLETREHGX 

Furthermore, ORF146ng-l shows homology with a hypothetical E.coli protein: 

sp[P33011|YEEA_ECOLI HYPOTHETICAL 40.0 KD PROTEIN IN COBU-SBMC INTERGENIC REGION 
>gi|1736674|gnl|PID|dl016553 (D90838) ORF_ID:o348#20; similar to [SwissProt 
Accession Number P33011] [Escherichia coli] >gi 1 1736682 | gnl | PID ] dl016560 (D90839) 
ORF_ID:o348#20; similar to [SwissProt Accession Number P33011] [Escherichia coli] 
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>gi 1 1788318 (AE000292) f352; 100% identical to fragment YEEA_ECOLI 3W: P33011 but 
has 203 additional C-terminal residues [Escherichia coli] Length = 352 
Score = 109 bits (271), Expect = 2e-23 

Identities = 89/347 (25%), Positives = 150/347 (42%), Gaps = 21/347 (6%) 

Query: 2 0 YRHRRLIHAVRLGGTVLFATALARLLHLQHGEWIGMTVFVVLGMLQFQGAIYSNAVERML 7 9 

YRH R++H R+ L + RL + W +T+ V++G + F G + A ER+ 
Sbjct: 15 YRHYRIVHGTRVALAFLLTFLIIRLFTIPE STWPLVTMWIMGPISFWGNVVPRAFERIG 74 

Query: 80 GTVIGLGAGLGVLWLNQHYFHGNLLFYLTIGTASALAGWAAVGKNGYVPMLAGLTMCMLI 139 

GTV+G GL L L L + A L GW A+GK Y +L G+T+ +++ 

Sbjct: 75 GTVLG S I LGLI ALQLE LISLPLMLVWCAAAMFLCGWLALGKKPYQGLLIGVTLAIW 131 

Query: 140 GDNGSEWLDSGLMRAMNVLIGXXXXXXXXKLLPLKSTLMWRFMLADNLADCSKMIAEISN 199 

G E +D+ L R+ +V++G + P ++ + WR LA +L + +++ + 

Sbjct: 132 GSPTGE-IDTALWRSGDVILGSLLAMLFTGIWPQRAFIHWRIQLAKSLTEYNRVYQSAFS 190 

Query: 200 GRRMTRERLEQNMVKMRQINARMVKSRSHLAATSGESRISPSMMEAMQHAHRKIVNXXXX 259 

+ R RLE ++ K+ VK R +A S E+RI S+ E +Q +R +V 

Sbjct: 191 PNLLERPRLESHLQKLL— TDAVKMRGLIAPASKETRIPKSIYEGIQTINRNLVCMLEL 247 

Query: 260 XXXXXXXXQSPK— LNGSEIRLLDRHFXXXXXXXXXXAALINGRHARRIRIDTAINPEL 316 

+ LN ++R D AL G +N + 

Sbjct: 248 QINAYWATRPSHFVLLNAQKLR — DTQHMMQQILLSLVHALYEGNPQPVFANTEKLNDAV 305 

Query: 317 EALAEHL — HYQWQ G FLW L S TNMRQE I SALVI LLQRTRRK 354 

E L + L H+ + G++WL+ ++ L L+ R RK 

Sbjct: 306 EELRQLLNNHHDLKWETPIYGYVWLNMETAHQLELLSNLICRALRK 352 

On the basis of this analysis, including the identification of several transmembrane domains in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 76 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 639> 

1 . . GCCGAAGACA CGCGCGTTAC CGCACAGCTT TTGAGCGCGT ACGGCATTCA 

51 GGGCAAACTC GTCAGTGTGC GCGAACACAA CGAACGGCAG ATGGCGGACA 

101 AGATTGTCGG CTATCTTTCA GACGGCATGG TTGTGGCACA GGTTTCCGAT 

151 GCGGGTACGC CGGCCGTGTG CGACCCGGGC GCGAAACTCG CCCGCCGCGT 

201 GCGTGAGGCC GGGTTTAAAG TCGTTCCCGT CGTGGGCGCA AC . GCGGTGA 

251 TGGCGGCTTT GAGCGTGGCC GGTGTGGAAG GATCCGATTT TTATTTCAAC 

301 GGTTTTGTAC CGCCGAAATC GGGAGAACGC AGGAAACTGT TTGCCAAATG 

351 GGTGCGGGCG GCGTTTCCTA TCGTCATGTT TGAAACGCCG CACCGCATCG 

401 GTGCAGCGCT TGCCGATATG GCGGAACTGT TCCCCGAACG CCGATTAATG 

451 CTGGCGCGCG AAATTACGAA AACGTTTGAA ACGTTCTTAA GCGGCACGGT 

501 TGGGGAAATT CAGACGGCAT TGTCTGCCGA CGGCGACCAA TCGCGCGGCG 

551 AGATGGTGTT GGTGCTTTAT CCGGCGCAGG ATGAAAAACA CGAAGGCTTG 

601 TCCGAGTCCG CGCAAAACAT CATGAAAATC CTCACAGCCG AGCTGCCGAC 

651 CAAACAGGCG GCGGAGCTTG CTGCCAAAAT CACGGGCGAG GGAAAGAAAG 

701 CTTTGTACGA T. . 

This corresponds to the amino acid sequence <SEQ ID 640; ORF147>: 



1 . . AEDTRVTAQL LSAYGIQGKL VSVREHNERQ MADKIVGYLS DGMWAQVSD 

51 AGTPAVCDPG AKLARRVREA GFKWPWGA XAVMAALSVA GVEGSDFYFN 

101 GFVPPKSGER RKLFAKWVRA AFPIVMFETP HRIGAALADM AELFPERRLM 

151 LAREITKTFE TFLSGTVGEI QTALSADGDQ SRGEMVLVLY PAQDEKHEGL 

201 SESAQNIMKI LTAELPTKQA AELAAKITGE GKKALYD . . 

Further work revealed the complete nucleotide sequence <SEQ ID 641>: 



1 ATGTTTCAGA AACATTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCGGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATCTGTGC CGAAGACACG 

151 CGCGTTACCG CACAGCTTTT GAGCGCGTAC GGCATTCAGG GCAAACTCGT 
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201 CAGTGTGCGC GAACACAACG AACGGCAGAT GGCGGACAAG ATTGTCGGCT 

251 ATCTTTCAGA CGGCATGGTT GTGGCACAGG TTTCCGATGC GGGTACGCCG 

301 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GTGAGGCCGG 

351 GTTTAAAGTC GTTCCCGTCG TGGGCGCAAG CGCGGTGATG GCGGCTTTGA 

4 01 GCGTGGCCGG TGTGGAAGGA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

4 51 CCGAAATCGG GAGAACGCAG GAAACTGTTT GCCAAATGGG TGCGGGCGGC 

501 GTTTCCTATC GTCATGTTTG AAACGCCGCA CCGCATCGGT GCGACGCTTG 

551 CCGATATGGC GGAACTGTTC CCCGAACGCC GATTAATGCT GGCGCGCGAA 

601 ATTACGAAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA 

651 GACGGCATTG TCTGCCGACG GCAACCAATC GCGCGGCGAG ATGGTGTTGG 

701 TGCTTTATCC GGCGCAGGAT GAAAAACACG AAGGCTTGTC CGAGTCCGCG 

7 51 CAAAACATCA TGAAAATCCT CACAGCCGAG CTGCCGACCA AACAGGCGGC 

801 GGAGCTTGCT GCCAAAATCA CGGGCGAGGG AAAGAAAGCT TTGTACGATC 

851 TGGCTCTGTC TTGGAAAAAC AAATAG 

This corresponds to the amino acid sequence <SEQ ID 642; ORF147-l>: 



1 MFQKHLQKAS DSVVGGTLYV VATPIGNLAD ITLRALAVLQ KADIICAEDT 

51 RVTAQLLSAY GIQGKLVSVR EHNERQMADK IVGYLSDGMV VAQVSDAGTP 

101 AVCDPGAKLA RRVREAGFK V VPWGASAVM AALSVA GVEG SDFYFNGFVP 

151 PKSGERRKLF AKWVRAAFPI VMFETPHRIG ATLADMAELF PERRLMLARE 

201 ITKT FETFLS GTVGEIQTAL SADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNIMKILTAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with hypothetical protein ORF286 oiE.coli (accession number Ul 8997) 
ORF147 and E.coli ORF286 protein show 36% aa identity in 237aa overlap: 

Orfl47: 1 AEDTRVTAQLLSAYGIQGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPG 60 

AEDTR T LL +GI +L ++ +HNE+Q A+ ++ L +G +A VSDAGTP + DPG 
Orf286: 43 AEDTRHTGLLLQHFGINARLFALHDHNEQQKAETLLAKLQEGQNIALVSDAGTPLINDPG 102 

Orfl47: 61 AKLARRVREXXXXXXXXXXXXXXXXXXXXXXXEGSDFYFNGFVPPKSGERRKLFAKWVRA 120 

L R RE F + GF+P KS RR 

Orf286: 103 YHLVRTCREAGIRWPLPGPCAAITALSAAGLPSDRFCYEGFLPAKSKGRRDALKAIEAE 162 

Orfl47: 121 AFPIVMFETPHRIGAALADMAELFPERR-LMLAREITKTFETFLSGTVGEIQTALSADGD 179 

++ +E+ HR+ +L D+ + E R ++LARE+TKT+ET VGE+ 4- D + 

Orf286: 163 PRTLIFYESTHRLLDSLEDIVAVLGESRYWLARELTKTWETIHGAPVGELLAWVKEDEN 222 

Orfl47: 180 QSRGEMVLVLYPAQDEKHEGLSESAQNI^KILTAELPTKQAAELAAKITGEGKKALY 236 

+ +GEMVL++ + E L A + +L AELP K+AA LAA+I G K ALY 

Orf286: 223 RRKGEMVLIV-EGHKAQEEDLPADALRTLALLQAELPLKKAAALAAEIHGVKKNALY 278 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF147 shows 96.6% identity over a 237aa overlap with ORF75a from strain A of N. meningitidis: 



orf 147 .pep 
orf 75a 



AEDTRVTAQLLSAYGIQGKLVSVREHNERQ 
T LYWAT P IGNLADITLRALAVLQKADI I CAE DTRVTAQLLSAYG I QGKLVS VREHNERQ 



40 50 60 70 80 90 

orf 147 .pep MADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLARRVREAGFK WPWGAXAVMAALSVA 

orf 75a MADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLARRVREVGF KWPWGA5AVMAALSVA 
80 90 100 110 120 130 



100 110 120 130 140 150 

orf 147 .pep GVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIVMFETPHRIGAALADMAELFPERRLM 

orf 75a GVAGSDFYFNGFVPPKSGERRKLFAKWVRVAFPWMFETPHRIGATLADMAELFPERRLM 
140 150 160 170 180 190 



orf 147 .pep 



160 170 180 190 200 210 

LAREITKTFETFLSGTVGEIQTALSADGDQSRGEMVLVLYPAQDEKHEGLSESAQNIMKI 
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orfl47 .pep 
orf75a 



LAREITKT FETFLSGTVGEIQTALAADGNQS^ 
200 210 220 230 240 250 



LTAELPTKQAAE LAAKITGEGKKALYD 
I I I I I I I I I I I I I I I I I I I I I I I I I I I 
LTAELPTKQAAE LAAKI TGEGKKAL YDLALSWKNKX 
260 270 280 290 



ORF147a is identical to ORF75a, which includes aa 56-292 of ORF75. 
Homology with a predicted ORF from N. gonorrhoeae 

ORF147 shows 94.1% identity over a 237aa overlap with a predicted ORF (ORF147ng) from N. 



orfl47.pep AEDTRVTAQLLSAYGIQGKLVSVREHNERQ 30 

I I I I I I I E I : I I I I I I I I I I I 

orfl47ng TLYWATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAYGIQGRLVSVREHHERQ 85 

orfl47.pep MADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLARRVREAGFKWPWGAXAVMAALSVA 90 
llll::|:llll:lllllllllllll I I I I I 

orfl47ng MADKVIGFLSDGLVVAQVSDAGTPAVCDPGAKLARRVREAGFKWPWGASAVMAALSVA 14 5 

orf 147 .pep GVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIVMFETPHRIGAALADMAELFPERRLM 150 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I : I I I I I I I I I I I I I I 
orfl47ng GVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPWMFETPHRIGATLADMAELFPERRLM 205 

orf 147 .pep LAREITKTFETFLSGTVGEIQTALSADGDQSRGEMVLVLYPAQDEKHEGLSESAQNIMKI 210 

I I I I I I I I I I I I I I I I I I I I I I I I : I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I III 
orfl47ng LAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQDEKHEGLSESAQNAMKI 265 

orf 147. pep LTAELPTKQAAE LAAKITGEGKKALYD 237 

orfl47ng LAAELPTKQAAELAAKITG3GKKALYDLALSWKNK 300 

An ORF147ng nucleotide sequence <SEQ LD 643> was predicted to encode a protein having amino 
acid sequence <SEQ ID 644>: 

1 MSVFQTAFFM FQKHLQKASD SVVGGTLYVV ATPIGNLADI TLRALAVLQK 

51 ADIICAEDTR VTAQLLSAYG IQGRLVSVRE HNERQMADKV IGFLSDGLVV 

101 AQVSDAGTPA VCDPGAKLAR RVREAGFK VV PVVGASAVMA ALSVA GVAES 

151 DFYFNGFVPP KSGERRKLFA KWVRAAFPVV MFETPHRIGA TLADMAELFP 

201 ERRLMLAREI TKTFETFLSG TVGEIQTALA ADGNQSRGEM VLVLYPAQDE 

251 KHEGLSESAQ NAMKILAAEL PTKQAAE LAA KITGEGKKAL YDLALSWKNK 



Further work revealed the following gonococcal DNA sequence <SEQ ID 645>: 



ATGTTTCAGA 
ATTATACGTG 
GCGCTTTGGC 
CGCGTTACTG 
CAGTGTGCGC 
TCCTTTCAGA 
GCCGTGTGCG 
GTTCAAAGTC 
GTGTGGCCGG 
CCGAAATCGG 
ATTTCCTGTC 
CCGATATGGC 
AT CACGAAAA 
GACGGCATTG 
TGCTTTATCC 
CAAAATGCGA 
GGAGCTTGCC 
TGGCACTGTC 



AACACTTGCA 
GTTGCCACGC 
GGTATTGCAA 
CGCAGCTTTT 
GAACACAACG 
CGGCCTGGTT 
ACCCGGGCGC 
GTTCCCGTCG 
TGTGGCGGAA 
GCGAACGTAG 
GTCATGTTTG 
GGAATTGTTC 
CGTTTGAAAC 
GCGGCGGACG 
GGCGCAGGAT 
TGAAAATCCT 
GCCAAGATTA 
GTGGAAAAAC 



GAAAGCCTCC 
CCATCGGCAA 
AAGGCGGACA 
GAGCGCGTAC 
AGCGGCAGAT 
GTGGCGCAGG 
GAAACTCGCC 
TGGGCGCAAG 
TCCGATTTTT 
GAAATTGTTT 
AAACGCCGCA 
CCCGAACGCC 
GTTCTTAAGC 
GCAACCAATC 
GAAAAACACG 
TGCGGCCGAG 
CAGGTGAGGG 
AAATGA 



GACAGCGTCG 
TTTGGCAGAC 
TCATTTGTGC 
GGCATTCAGG 
GGCGGACAAG 
TTTCCGATGC 
CGCCGCGTGC 
CGCGGTAATG 
ATTTCAACGG 
GCCAAATGGG 
CCGAATCGGG 
GTCTGATGCT 
GGCACGGTTG 
GCGCGGCGAG 
AAGGCTTGTC 
CTGCCGACCA 
CAAAAAGGCT 



TCGGAGGGAC 
ATTACCCTGC 
CGAAGACACG 
GCAGGTTGGT 
GTAATCGGTT 
GGGTACGCCG 
GCGAAGCAGG 
GCGGCGTTGA 
TTTTGTACCG 
TGCGGGCGGC 
GCAACGCTTG 
GGCGCGCGAA 
GGGAAATTCA 
ATGGTGTTGG 
CGAGTCTGCG 
AGCAGGCGGC 
TTGTACGATT 
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This corresponds to the amino acid sequence <SEQ ID 646; ORF147ng-l>: 

1 MFQKHLQKAS DSWGGTLYV VATPIGNLAD ITLRALAVLQ KADIICAEDT 

51 RVTAQLLSAY GIQGRLVSVR EHNERQMADK VIGFLSDGLV VAQVSDAGTP 

101 AVCDPGAKLA RRVREAG FK V VPVVGASAVM AALSVA GVAE SDFYFNGFVP 

151 PKSGERRKLF AKWVRAAFPV VMFETPHRIG ATLADMAELF PERRLMLARE 

201 ITKTFETFLS GTVGEIQTAL A^DGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNAMKILAAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

ORF147ng shows homology to a hypothetical E.coli protein: 

sp|P45528|YRAL_ECOLI HYPOTHETICAL 31.3 KD PROTEIN IN AGAI-MTR INTERGENIC REGION 
(F286) 

>gi|606086 (U18997) 0RF_f286 [Escherichia coli] 

>gi! 1789535 (AE000395) hypothetical 31.3 kD protein in agai-mtr intergenic region 
[Escherichia coli] Length = 286 
Score = 218 bits (550), Expect = 3e-56 

Identities = 128/284 (45%), Positives = 171/284 (60%), Gaps = 4/284 (1%) 

Query: 4 KHLQKA5D5WGGTLYVVATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAYGIQ 63 

K Q A +S G LY+V TPIGNLADIT RAL VLQ D+I AEDTR T LL +GI 
Sbjct: 2 KQHQSADNSQ— GQLYIVPTPIGNLADITQRALEVLQAVDLIAAEDTRHTGLLLQHFGIN 59 

Query: 64 GRLVSVREHNERQMADKVIGFLSDGLVVAQVSDAGTPAVCDPGAKLARRVREAGFKWPV 123 

RL ++ +HNE+Q A+ ++ L +G +A VSDAGTP + DPG L R REAG +WP+ 
Sbjct: 60 ARLFALHDHNEQQKAETLLAKLQEGQNIALVSDAGTPLINDPGYHLVRTCREAGIRWPL 119 

Query: 124 VGASAVMAALSVAGVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPVVMFETPHRIGATL 183 

G A + ALS AG+ F + GF+P KS RR ++ +E+ HR+ +L 

Sbjct: 120 PGPCAAITALSAAGLPSDRFCYEGFLPAKSKGRRDALKAIEAEPRTLIFYESTHRLLDSL 179 

Query: 184 ADMAELFPERR-LMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQDEK 242 

D+ + E R ++LARE+TKT+ET VGE+ + D N+ +GEMVL++ + 

Sbjct: 180 EDIVAVLGESRYWLARELTKTWETIHGAPVGELLAWVKEDENRRKGEMVLIV-EGHKAQ 238 

Query: 243 HEGLSESAQNAMKILAAELPTKQAAELAAKITGEGKKALYDLAL 286 

EL A + +L AELP K+AA LAA+I G K ALY AL 
Sbjct: 239 EEDLPADALRTLALLQAELPLKKAAALAAEIHGVKKNALYKYAL 282 

Based on the computer analysis and the presence of a putative transmembrane domain in the 
gonococcal protein, it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 77 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 647> 

1 ATGAAAACAA CCGACAAACG GACAACCGAA ACACACCGCA AAGCCCCGAA 

51 AACCGGTCGC ATCCGCTTCT C.GCTGCTTA CTTAGCCATA TGCCTGTCGT 

101 TCGGCATTCT TCCCCAAGCC TGGGCGGGAC ACACTTATTT CGGCATCAAC 

151 TACCAATACT ATCGCGACTT TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 

201 GGCGAAAGAT ATTGAGGTTT ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 

251 CAATGACAAA AGCCCCGATG ATTGATTTTT CTGTGGTGTC GCGTAACGGC 

301 GTGGCGGcAT TGGTGGGCGt AJ_CAATATAT TGTGAGCGTG GCACATAACG 

351 GCGGCTATAA CAACGTTGAT TTTGGTGCGG AAGGAAk.AA tATCCC . GAT 

401 CAACAwCGww TTACTTATAA AATTGTGAAA CGGAATAATT ATAAAGCAGG 

4 51 GACTAAAGGC CATCCTTATG GCGGCGATTA TCATATGCCG CGTTTGCATA 

501 AATwTGTCAC AGATGCAGAA CCTGTTGAAA TGACCAGTTA TATGGATGGG 

551 CGGAAATATA TCGATCAAAA TAATTACCCT GACCGTGTTC GTATTGGGGC 

601 AGGCAGGCAA TATTGGCGAT CTGATGAAGA TGAGCCCAAT AACCGCGAAA 

651 GTTCATATCA TATTGCAAGT 

701 GGCTC ACCAATGTTT ATCTATGATG CCCAAAAGCA 

751 AAAGTGGTTA ATTAATGGGG TATTGCAAAC GGGCAACCCC TATATAGGAA 

801 AAAGCAATGG CTTCCAGCTG GTTCGTAAAG ATTGGTTCTA TGATGAAATC 

851 TTTGCTGGAG ATACCCATTC AGTATTCTAC GAACCACGTC AAAATGGGAA 

901 ATACTCTTTT AACGACGATA ATAATGGCAC AGGAAAAATC AATGCCAAAC 
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951 ATGAACACAA TTCTCTGCCT AATAGATTAA AAACACGAAC CGTTCAATTG 

1001 TTTAATGTTT CTTTATCCGA GACAGCAAGA GAACCTGTTT ATCATGCTGC 

1051 AGGTGGTGTC AACAGTTATC GACCCAGACT GAATAATGGA GAAAATATTT 

1101 CCTTTATTGA CGAAGGAAAA GGCGAATTGA TACT T AC C AG CAACATCAAT 

1151 CAAGGTGCTG GAGGATTATA TTTCCAAGGA GATTTTACGG TCTCGCCTGA 

1201 AAATAACGAA ACTTGGCAAG GCGCGGGCGT TCATATCAGT GAAGACAGTA 

1251 CCGTTACTTG GAAAGTAAAC GGCGTGGCAA ACGACCGCCT GTCCAAAATC 

1301 GGCAAAGGCA CGCTG 

// 

2101 GATAAAG 

2151 TGACTGCTTC ATTGACTAAG ACCGACATCA GCGGCAATGT CGATCTTGCC 

2201 GATCACGCTC ATTTAAATCT CACAGGGCTT GCCACACTCA ACGGCAATCT 

2251 TAGTGCAAAT GGCGATACAC GTTATACAGT CAGCCACAAC GCCACCCAAA 

2301 ACGGCAACCk TAgCCtCGtG G.sAATGcCC AAGCAACATT TAATCAAGCC 

2351 ACATTAAACG GCAACACATC GGCTTCgGGC AATGCTTCAT TTAATCTAAG 

24 01 CGACCACGCC GTACAAAACG GCAGTCTGAC GCTTTCCGGC AACGCTAAGG 

2 4 51 CAAACGTAAG CCATTCCGCA CTCAACGGTA ATGTCTCCCT AGCCGATAAG 

2501 GCAGTATTCC ATTTTGAAAG CAGCCGCTTT ACCGGACAAA TCAGCGGCGG 

2551 CAagGATACG GCATTACACT TAAAAGACAG CGAATGGACG CTGCCGTCAg 

2 601 GarCGGAATT AGGCAATTTA AACCTTGACA ACGCCACCAT TACaCTCAAT 

2 651 TCCGCCTATC GCCACGATGC GGCAGGGGCG CAAACCGGCA GTGCGACAGA 

27 01 TGCGCCGCGC CGCCGTTCGC GCCGTTCGCG CCGTTCCCTA TTATmCGTTA 

2751 CACCGCCAAC TTCGGTAGAA TCCCGTTTCA ACACGCTGAC GGTAAACGGC 

2 8 01 AAATTGAACG GTCAGGGAAC ATTCCGCTTT ATGTCGGAAC TCTTCGGCTA 

2851 CCGCAGCGAC AAATTGAAGC TGGCGGAAAG TTCCGAAGGC ACTTACACCT 

2 901 TGGCGGTCAA CAATACCGGC AACGAACCTG CAAGCCTCGA ACAATTGACG 

2 951 GTAGTGGAAG GAAAAGACAA CAAACCGCTG TCCGAAAACC TTAATTTCAC 
3001 CCTGCAAAAC GAACACGTCG ATGCAGGCGC GTGG 

// 

3551 TTAGAC CGCGTATTTG CCGAAGACCG 

3 601 CCGCAACGCC GTTTGGACAA GCGGCATCCG GGACACCAAA CACTACCGTT 
3 651 CGCAAGATTT CCGCGCCTAC CGCCAACAAA CCGACCTGCG CCAAATCGGT 
37 01 ATGCAGAAAA ACCTCGGCAG CGGGCGCGTC GGCATCCTGT TTTCGCACAA 
3751 CCGGACCGAA AACACCTTCG ACGACGGCAT CGGCAACTCG GCACGGCTTG 
3801 CCCACGGCGC CGTTTTCGGG CAATACGGCA TCGACAGGTT CTACATCGGC 
3 851 ATCAGgCGCG GGCGCGGGTT TTAGCAGCGG CAGCCTTTcA GACGGCATCG 
3 901 GAGsmAAAwT CCGCCGCCGC GTGCtGCATT ACGGCATTCA GGCACGAtAC 

3 951 CGCGCCGgtt tCggCGgATt CGGCATCGAA CCGCACATCG GCGCAACGCg 

4 001 ctATTTCGTC CAAAAAGCGG ATTACCGCTA CGAAAACGTC AATATCGCCA 
4051 CCCCCGGCCT TGCATTCAAC CGcTACCGCG CGGGCATTAa GGCAGATTAT 
4101 TCATTCAAAC CGGCGCAACA CATTTCCATC ACGCCTTATT TGAGCCTGTC 
4151 CTATACCGAT GCCGCTTCGG GCAAAGTCCG AACACGCGTC AATACCGCCG 
4201 TATTGGCTCA GGATTTCGGC AAAACCCC-CA GTGCGGAATG GGgCGTAAAC 
4251 GCCGAAATCA AAGGTTTCAC GCTGTCCCTC CACGCTGCCG CCGCCAAAGG 
4 301 CCCGCAACTG GAAGCGCAAC ACAGCGCGGG CATCAAATTA GGCTACCGCT 
4351 GGTAA. . . 

This corresponds to the amino acid sequence <SEQ ED 648; ORFl>: 

1 MKTTDKRTTE THRKAPKTGR IRFXAAYLAI CLSFGILPQA WAGHTYFGIN 

51 YQYYRDFAEN KGKFAVGAKD IEVYNKKGEL VGKSMTKAPM IDFSWSRNG 

101 VAALVGVQYI VSVAHNGGYN NVDFGAEGXN IXDQXRXTYK IVKRNNYKAG 

151 TKGHPYGGDY HMPRLHKXVT DAEPVEMTSY MDGRKYIDQN NYPDRVRIGA 

2 01 GRQYWRSDED EPNNRESSYH IAS GS PMFIYDAQKQ 

251 KWLINGVLQT GNPYIGKSNG FQLVRKDWFY DEIFAGDTHS VFYEPRQNGK 

301 YSFNDDNNGT GKINAKHEHN SLPNRLKTRT VQLFNVSLSE TAREPVYHAA 

351 GGVNSYRPRL NNGENISFID EGKGELILTS KINQGAGGLY FQGDFTVSPE 

401 NNETWQGAGV HISEDSTVTW KVNGVANDRL SKIGKGTL 

// 

701 DKVTAS LTKTDISGNV DLADHAHLNL TGLATLNGNL 

751 SANGDTRYTV SHNATQNGNX SLVXNAQATF NQATLNGNTS ASGNASFNLS 

801 DHAVQNGSLT LSGNAKANVS ESALNGNVSL ADKAVFHFES SRFTGQISGG 

851 KDTALHLKDS EWTLPSGXEL GNLNLDNATI TLNSAYRHDA AGAQTGSATD 

901 APRRRSRRSR RSLLXVTPPT SVESRFNTLT VNGKLNGQGT FRFMSELFGY 

951 RSDKLKLAES SEGTYTLAVN KTGNEPASLE QLTWEGKDN KPLSENLNFT 

1001 LQNEHVDAGA W 

// 

1151 LDRVFAEDR 

1201 RNAVWTSGIR DTKHYRSQDF RAYRQQTDLR QIGMQKNLGS GRVGILFSHN 

1251 RTENTFDDGI GNSARLAHGA VFGQYGIDRF YIGISAGAGF SSGSLSDGIG 

1301 XKXRRRVLHY GIQARYRAGF GGFGIEPHIG ATRYFVQKAD YRYENVNIAT 

1351 PGLAFNRYRA GIKADYSFKP AQHISITPYL SLSYTDAASG KVRTRVNTAV 
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14 01 LAQDFGKTRS AEWGVNAEIK GFTLSLHAAA AKGPQLEAQH SAGIKLGYRW 

1451 * 

Further sequencing analysis revealed the complete nucleotide sequence <SEQ ID 649>: 

1 ATGAAAACAA CCGACAAACG GACAACCGAA ACACACCGCA AAGCCCCGAA 

51 AACCGGCCGC ATCCGCTTCT CGCCTGCTTA CTTAGCCATA TGCCTGTCGT 

101 TCGGCATTCT TCCCCAAGCC TGGGCGGGAC ACACTTATTT CGGCATCAAC 

151 TACCAATACT ATCGCGACTT TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 

201 GGCGAAAGAT ATTGAGGTTT ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 

251 CAATGACAAA AGCCCCGATG ATTGATTTTT CTGTGGTGTC GCGTAACGGC 

301 GTGGCGGCAT TGGTGGGCGA TCAATATATT GTGAGCGTGG CACATAACGG 

351 CGGCTATAAC AACGTTGATT TTGGTGCGGA AGGAAGAAAT CCCGATCAAC 

4 01 ATCGTTTTAC TTATAAAATT GTGAAACGGA AT AAT TAT AA AGCAGGGACT 

4 51 AAAGGCCATC CTTATGGCGG CGATTATCAT ATGCCGCGTT TGCATAAATT 

501 TGTCACAGAT GCAGAACCTG TTGAAATGAC CAGTTATATG GATGGGCGGA 

551 AATATATCGA TCAAAATAAT TACCCTGACC GTGTTCGTAT TGGGGCAGGC 

601 AGGCAATATT GGCGATCTGA TGAAGATGAG CCCAATAACC GCGAAAGTTC 

651 ATATCATATT GCAAGTGCGT ATTCTTGGCT CGTTGGTGGC AATACCTTTG 

7 01 CACAAAATGG ATCAGGTGGT GGCACAGTCA ACTTAGGTAG TGAAAAAATT 

7 51 AAACATAGCC CATATGGTTT TTTACCAACA GGAGGCTCAT TTGGCGACAG 

801 TGGCTCACCA ATGTTTATCT ATGATGCCCA AAAGCAAAAG TGGTTAATTA 

851 ATGGGGTATT GCAAACGGGC AACCCCTATA TAGGAAAAAG CAATGGCTTC 

901 CAGCTGGTTC GTAAAGATTG GTTCTATGAT GAAATCTTTG CTGGAGATAC 

951 CCATTCAGTA TTCTACGAAC CACGTCAAAA TGGGAAATAC TCTTTTAACG 

1001 ACGATAATAA TGGCACAGGA AAAATCAATG CCAAACATGA ACACAATTCT 

1051 CTGCCTAATA GATTAAAAAC ACGAACCGTT CAATTGTTTA ATGTTTCTTT 

1101 ATCCGAGACA GCAAGAGAAC CTGTTTATCA TGCTGCAGGT GGTGTCAACA 

1151 GTTATCGACC CAGACTGAAT AATGGAGAAA ATATTTCCTT TATTGACGAA 

1201 GGAAAAGGCG AATTGATACT TACCAGCAAC ATCAATCAAG GTGCTGGAGG 

1251 ATTATATTTC CAAGGAGATT TTACGGTCTC GCCTGAAAAT AACGAAACTT 

1301 GGCAAGGCGC GGGCGTTCAT ATCAGTGAAG ACAGTACCGT TACTTGGAAA 

1351 GTAAACGGCG TGGCAAACGA CCGCCTGTCC AAAATCGGCA AAGGCACGCT 

14 01 GCACGTTCAA GCCAAAGGGG AAAACCAAGG CTCGATCAGC GTGGGCGACG 

1451 GTACAGTCAT TTTGGATCAG CAGGCAGACG ATAAAGGCAA AAAACAAGCC 

1501 TTTAGTGAAA TCGGCTTGGT CAGCGGCAGG GGTACGGTGC AACTGAATGC 

1551 CGATAATCAG TTCAACCCCG ACAAACTCTA TTTCGGCTTT CGCGGCGGAC 

1601 GTTTGGATTT AAACGGGCAT TCGCTTTCGT TCCACCGTAT TCAAAATACC 

1651 GATGAAGGGG CGATGATTGT CAACCACAAT CAAGACAAAG AATCCACCGT 

17 01 TACCATTACA GGCAATAAAG ATATTGCTAC AACCGGCAAT AACAACAGCT 

1751 TGGATAGCAA AAAAGAAATT GCCTACAACG GTTGGTTTGG CGAGAAAGAT 

1801 ACGACCAAAA CGAACGGGCG GCTCAACCTT GTTTACCAGC CCGCCGCAGA 

1851 AGACCGCACC CTGCTGCTTT CCGGCGGAAC AAATTTAAAC GGCAACATCA 

1901 CGCAAACAAA CGGCAAACTG TTTTTCAGCG GCAGACCAAC ACCGCACGCC 

1951 TACAATCATT TAAACGACCA TTGGTCGCAA AAAGAGGGCA TTCCTCGCGG 

2001 GGAAATCGTG TGGGACAACG ACTGGATCAA CCGCACATTT AAAGCGGAAA 

2051 ACTTCCAAAT TAAAGGCGGA CAGGCGGTGG TTTCCCGCAA TGTTGCCAAA 

2101 GTGAAAGGCG ATTGGCATTT GAGCAATCAC GCCCAAGCAG TTTTTGGTGT 

2151 CGCACCGCAT CAAAGCCACA CAATCTGTAC ACGTTCGGAC TGGACGGGTC 

2201 TGACAAATTG TGTCGAAAAA ACCATTACCG ACGATAAAGT GATTGCTTCA 

2251 TTGACTAAGA CCGACATCAG CGGCAATGTC GATCTTGCCG ATCACGCTCA 

2301 TTTAAATCTC ACAGGGCTTG CCACACTCAA CGGCAATCTT AGTGCAAATG 

2351 GCGATACACG TTATACAGTC AGCCACAACG CCACCCAAAA CGGCAACCTT 

24 01 AGCCTCGTGG GCAATGCCCA AGCAACATTT AATCAAGCCA CATTAAACGG 

24 51 CAACACATCG GCTTCGGGCA ATGCTTCATT TAATCTAAGC GACCACGCCG 

2501 TACAAAACGG CAGTCTGACG CTTTCCGGCA ACGCTAAGGC AAACGTAAGC 

2551 CATTCCGCAC TCAACGGTAA TGTCTCCCTA GCCGATAAGG CAGTATTCCA 

2 601 TTTTGAAAGC AGCCGCTTTA CCGGACAAAT CAGCGGCGGC AAGGATACGG 

2 651 CAT T AC ACT T AAAAGACAGC GAATGGACGC TGCCGTCAGG CACGGAATTA 

27 01 GGCAATTTAA ACCTTGACAA CGCCACCATT ACACTCAATT CCGCCTATCG 

27 51 CCACGATGCG GCAGGGGCGC AAACCGGCAG TGCGACAGAT GCGCCGCGCC 

2801 GCCGTTCGCG CCGTTCGCGC CGTTCCCTAT TATCCGTTAC ACCGCCAACT 

2851 TCGGTAGAAT CCCGTTTCAA CACGCTGACG GTAAACGGCA AATTGAACGG 

2 901 TCAGGGAACA TTCCGCTTTA TGTCGGAACT CTTCGGCTAC CGCAGCGACA 

2 951 AATTGAAGCT GGCGGAAAGT TCCGAAGGCA CTTACACCTT GGCGGTCAAC 

3001 AATACCGGCA ACGAACCTGC AAGCCTCGAA CAATTGACGG TAGTGGAAGG 

3051 AAAAGACAAC AAACCGCTGT CCGAAAACCT TAATTTCACC CTGCAAAACG 

3101 AACACGTCGA TGCCGGCGCG TGGCGTTACC AACTCATCCG CAAAGACGGC 

3151 GAGTTCCGCC TGCATAATCC GGTCAAAGAA CAAGAGCTTT CCGACAAACT 

3201 CGGCAAGGCA GAAGC CAAAA AACAGGCGGA AAAAGACAAC GCGCAAAGCC 

3251 TTGACGCGCT GATTGCGGCC GGGCGCGATG CCGTCGAAAA GACAGAAAGC 

3301 GTTGCCGAAC CGGCCCGGCA GGCAGGCGGG GAAAATGTCG GCATTATGCA 
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3351 GGCGGAGGAA GAGAAAAAAC GGGTGCAGGC GGATAAAGAC ACCGCCTTGG 

3401 CGAAACAGCG CGAAGCGGAA ACCCGGCCGG CTACCACCGC CTTCCCCCGC 

3451 GCCCGCCGCG CCCGCCGGGA TTTGCCGCAA CTGCAACCCC AACCGCAGCC 

3501 CCAACCGCAG CGCGACCTGA TCAGCCGTTA TGCCAATAGC GGTTTGAGTG 

3 551 AATTTTCCGC CACGCTCAAC AGCGTTTTCG CCGTACAGGA CGAATTAGAC 

3 601 CGCGTATTTG CCGAAGACCG CCGCAACGCC GTTTGGACAA GCGGCATCCG 

3651 GGACACCAAA CACTACCGTT CGCAAGATTT CCGCGCCTAC CGCCAACAAA 

3701 CCGACCTGCG CCAAATCGGT ATGCAGAAAA ACCTCGGCAG CGGGCGCGTC 

3751 GGCATCCTGT TTTCGCACAA CCGGACCGAA AACACCTTCG ACGACGGCAT 

3801 CGGCAACTCG GCACGGCTTG CCCACGGCGC CGTTTTCGGG CAATACGGCA 

3 851 TCGACAGGTT CTACATCGGC ATCAGCGCGG GCGCGGGTTT TAGCAGCGGC 

3 901 AGCCTTTCAG ACGGCATCGG AGGCAAAATC CGCCGCCGCG TGCTGCATTA 

3 951 CGGCATTCAG GCACGATACC GCGCCGGTTT CGGCGGATTC GGCATCGAAC 

4 001 CGCACATCGG CGCAACGCGC TATTTCGTCC AAAAAGCGGA TTACCGCTAC 
4 051 GAAAACGTCA ATATCGCCAC CCCCGGCCTT GCATTCAACC GCTACCGCGC 
4101 GGGCATTAAG GCAGATTATT CATTCAAACC GGCGCAACAC ATTTCCATCA 
4151 CGCCTTATTT GAGCCTGTCC TATACCGATG CCGCTTCGGG CAAAGTCCGA 
4201 ACACGCGTCA ATACCGCCGT ATTGGCTCAG GATTTCGGCA AAACCCGCAG 
4251 TGCGGAATGG GGCGTAAACG CCGAAATCAA AGGTTTCACG CTGTCCCTCC 
4 301 ACGCTGCCGC CGCCAAAGGC CCGCAACTGG AAGCGCAACA CAGCGCGGGC 
4 351 ATCAAATTAG GCTACCGCTG GTAA 

This corresponds to the amino acid sequence <SEQ ID 650; ORFl-l>: 



1 MKTTDKRTTE THRKAPKTGR IRFSPAYLAI CLSFGIL PQA WAGHTYFGIN 

51 YQYYRDFAEN KGKFAVGAKD IEVYNKKGEL VGKSMTKAPM IDFSWSRNG 

101 VAALVGDQYI VSVAHNGGYN NVDFGAEGRN PDQHRFTYKI VKRNNYKAGT 

151 KGHPYGGDYH MPRLHKFVTD AEPVEMTSYM DGRKYIDQNN YPDRVRIGAG 

201 RQYWRSDEDE PNNRESSYHI ASAYSWLVGG NTFAQNGSGG GTVNLGSEKI 

251 KHSPYGFLPT GGSFGD3G3P MFIYDAQKQK WLINGVLQTG NPYIGKSNGF 

301 QLVRKDWFYD EIFAGDTKSV FYEPRQNGKY SFMDDNNGTG KINAKHEHNS 

351 LPNRLKTRTV QLFNVSLSET AREPVYHAAG GVNSYRPRLN NGENISFIDE 

401 GKGELILTSN INQGAGGLYF QGDFTVSPEN NETWQGAGVH ISEDSTVTWK 

451 VNGVANDRLS KIGKGTLHVQ AKGENQGS I S VGDGTVILDQ QADDKGKKQA 

501 FSEIGLVSGR GTVQLNADNQ FNPDKLYFGF RGGRLDLNGH SLSFHRIQNT 

551 DEGAMIVNHN ODKESTVTIT GNKDIATTGN NNSLDSKKEI AYNGWFGEKD 

601 TTKTNGRLNL VYQPAAEDRT LLLSGGTNLN GNITQTNGKL FFSGRPTPHA 

651 YNHLNDHWSQ KEGIPRGEIV WDNDWINRTF KAENFQIKGG QAWSRNVAK 

701 VKGDWHLSNH AQAVFGVAPH QSHTICTRSD WTGLTNCVEK TITDDKVIAS 

751 LTKTDISGNV DLADHAHLNL TGLATLNGNL SANGDTRYTV SHNATQNGNL 

801 SLVGNAQATF NQATLNGNTS ASGNASFNLS DHAVQNGSLT LSGNAKANVS 

851 HSALNGNVSL ADKAVFHFES SRFTGQISGG KDTALHLKDS EWTLPSGTEL 

901 GNLNLDNATI TLNSAYRHDA AGAQTGSATD APRRRSRRSR RSLLSVTPPT 

951 SVESRFNTLT VNGKLNGQGT FRFMSELFGY RSDKLKLAES SEGTYTLAVN 

1001 NTGNEPASLE QLTVVEGKDN KPLSENLNFT LQNEHVDAGA WRYQLIRKDG 

1051 EFRLHNPVKE QELSDKLGKA EAKKQAEKDN AQSLDALIAA GRDAVEKTES 

1101 VAEPARQAGG ENVGIMQAEE EKKRVQADKD TALAKQREAE TRPATTAFPR 

1151 ARRARRDLPQ LQPQPQPQPQ RDLISRYANS GLSEFSATLN SVFAVQDELD 

1201 RVFAEDRRNA VWTSGIRDTK HYRSQDFRAY RQQTDLRQIG MQKNLGSGRV 

1251 GILFSHNRTE NTFDDGIGNS ARLAHGAVFG QYGIDRFYIG ISAGAGFSSG 

1301 SLSDGIGGKI RRRVLHYGIQ ARYRAGFGGF GIEPHIGATR YFVQKADYRY 

1351 ENVNIATPGL AFNRYRAGIK ADYSFKPAQH ISITPYLSLS YTDAASGKVR 

1401 TRVNTAVLAQ DFGKTRSAEW GVNAEIKGFT LSLHAAAAKG PQLEAQHSAG 

1451 IKLGYRW* 

Computer analysis of these sequences gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF1 shows 57.8% identity over a 1456aa overlap with an ORF (ORF la) from strain A of TV. 
meningitidis: 

10 20 30 40 50 60 

orf 1 . pep MKTTDKRTTETHRKAPKTGR IRFXAAYLAICLSFGIL PQAWAGHTYFGINYQYYRDFAEN 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf la mkttdkrttethrkapktgr ;rfspaylaiclsfgil pqawaghtyfginyqyyrdfaen 

10 20 30 40 50 60 



orf 1 . pep 



70 80 90 100 110 120 

kgkfavgakdievynkkgelvgksmtkapmidfswsrngvaalvgvqyivsvahnggyn 
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orf 1 .pep 
orf la 



NVDFGAEGXNIXDQXRXTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKXVTDAEPVEMTSY 
NVDFGAEGXN-PDQHRFSYQIVKRNNYKPDNS-HPYNGDXHMPRLHKFVTDAEPVEMTSD 



or f 1 . pep MDGRKYI DQNNYPDRVRI GAGRQYWRS DE DE P NN 

orf la MRGNTYSDKEKYPERVRIGSGHHYWRYDDDKHGDLSYSGAWLIGGNTHMQGWGNNGVXSL 
180 190 200 210 220 230 

220 230 240 250 260 

orf 1 .pep RESSYH IA SGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGFQLVRK 

orf la SGDVRHANDYGPMPIAGAAGDSGSPMFIYDKTNNKWLLNGVLQTGYPYSGRENGFQLIRK 
240 250 260 270 280 290 

270 280 290 300 310 320 

orf 1. pep DWFYDEIFAGDTHSVFYEPRQNGKYSFNDDNNGTGKINAKHEHNSLPNRLKTRTVQLFNV 

I I I I I : I : I I I I : I : I I I : I I : : I I : : : I I I I I : : : I : I I : I I : : I I : I I : 
orf la DWFYDDIYRGDTHTVXFEPRSNGHFSFTSNNNGTGTVTETNEKVSNP-KLKVQTVRLFDE 
300 310 320 330 340 350 



orf 1 . pep 
orf la 



SLSETARE PVYHAAGGVNSYRPRLNNGENISFIDEGKGELILTSNINQGAGGLYFQGDFT 
SLNETDKE PVY-AAGGVNQYRPRLNNGENLSFIDYGNGKLILSNNINQGAGGLYFEGDFT 



orf 1. pep 

orfla 



VSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTL — 



orf 1 . pep 
orfla 



orfl.pep 
orfla 



orf 1 .pep 
orfla 



orfl.pep 
orfla 



orf 1 . pep XXXXXDKVTASLTKTDISGNVDLADHAHLNLTGLATLNGNLSAN 

: I I : I I I I I I I I I I I I I I 1 : I I : I I I I I I I 

orfla TICTRSDWTGLTNCVEXXITDDKVIASLTKTDXSGXVXLXXXXXXXLXGXAXLXGNLSAN 
720 730 740 750 760 770 



orfl.pep 



490 500 510 520 530 540 

GDTRYTVSHNATQNGNXSLVXNAQATFNQATLNGNTSASGNASFNLSDHAVQNG3LTLSG 
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GDTRYTVSHNATQNGNLSLVGNAQATFNQATLNGNXSXSGNASFNLSNNAAQNGSLTLSD 



NAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWTLPSGXELGNL 
NAKANVSHSALNGNVSLADKAVFHFENSRFTGQLSGSKXTALHLKDSEWTLPSGTELGNL 



NLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLXVTPPTSVESRFNTLTVNG 



NLDNATITLNSAYRHDAAGAQTGXVSDTPRRRSRRS— 



-LLSVTPPTSVESRFNTLTVNG 



orfl .pep 
orf la 



orfl . pep 
orf la 



orfl. pep 
orf la 



VFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFSHNRTEN 



orfl . pep TFDDGIGNSARLAHGAVFGQYGIDRFYIGISAGAGFSSGSLSDGIGXKXRRRVLHYGIQA 
: i I I I I I I I I I I I I I I I I I I II I II I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I 
orf la XFDDGIGNSARLAHGAVFGQYGIGRFDIGI STGAGFSSGXLSDGIGGKIRRRVLHYGIQA 

1260 1270 1280 1290 1300 1310 



orfl. pep 
orf la 



RYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGLAFNRYRAGIKADYSFKPAQHI 



orf 1 .pep 
orf la 



70 The complete length ORFla nucleotide sequence <SEQ ID 651> is: 



-364- 



1 ATGAAAACAA CCGACAAACG 

51 AACCGGCCGC ATCCGCTTCT 

101 TCGGCATTCT TCCCCAAGCT 

151 TACCAATACT ATCGCGACTT 

201 GGCGAAAGAT ATTGAGGTNT 

251 CAATGACAAA AGCCCCGATG 

301 GTGGCGGCAT TGGTGGGCGA 

351 CGGCTATAAC AACGTTGATT 

4 01 ACCGTTTTTC TTACCAAATT 

451 TCACACCCTT ACAACGGCGA 

501 CACAGATGCA GAACCTGTCG 

551 ATTCCGATAA AGAAAAATAT 

601 CACTATTGGC GTTATGATGA 

651 CGCATGGTTA ATTGGCGGCA 

701 GCGTANTTAG TTTGAGCGGC 

751 ATGCCGATTG CAGGTGCGGC 

801 T G AC AAAAC A AACAATAAAT 

851 ACCCTTATTC CGGCAGGGAA 

901 TTCTACGATG ACATTTACAG 

951 GCGCAGTAAC GGACATTTTT 

1001 CGGTAACAGA AACCAACGAA 

1051 ACAGTCCGAC TGTTTGACGA 

1101 TTACGCGGCA GGGGGTGTTA 

1151 AAAACCTTTC TTTTATCGAT 

1201 AACATCAACC AAGGCGCGGG 

1251 CTCGCCTGAA AACAACGAAA 

1301 AAGACAGTAC CGTTACTTGG 

1351 TCCAAAATCG GCAAAGGCAC 

14 01 AGGCTCGATC AGCGTGGGCG 

1451 ACGATAAAGG CAAAAAACAA 

1501 AGGGGTACGG TGCAACTGAA 

1551 CTATTTCGGC TTTCGCGGCG 

1601 CGTTCCACCG TATTCAAAAT 

1651 AATGCCACAA CAACATCCAC 

1701 ACAACCGAGT GGTAAGAATA 

1751 CCTACAACGG TTGGTTTGGC 

1801 CTCAACCTTG TTTACCAGCC 

1851 CGGCGGAACA AATTTAAACG 

1901 TTTTCAGCGG CAGACCGACA 

1951 TGGTCAAAAA TGGAAGGTAT 

2001 CTGGATCNAC CGCACGTTTA 

2051 AGGCGGTGAT TTCCCGCAAT 

2101 AGCAATCACG CCCAAGCAGT 

2151 AATCTGTACA CGTTCGGACT 

2201 NCATTACCGA CGATAAAGTG 

2251 GGCANTGTNA GNCTNNCCNA 

2 301 NNCACTNAAN GGCAATCTTA 

2351 GCCACAACGC CACCCAAAAC 

2401 GCAACATTTA ATCAAGCCAC 

2451 TGCTTCATTT AATCTAAGCA 

2501 TTTCCGACAA CGCTAAGGCA 

2551 GTCTCCCTAG CCGATAAGGC 

2 601 CGGACAACTC AGCGGCAGCA 

2 651 AATGGACGCT GCCGTCAGGC 

2701 GCCACCATTA CACTCAATTC 

2751 AACCGGCAGN GTGTCAGACA 

2801 TATCCGTTAC ACCGCCAACT 

2851 GTAAACGGCA AATTGAACNG 

2 901 CTTCGGCTAC CGAAGCGACA 

2951 CTTACACCTT GGCGGTCAAC 

3001 CAATTGACGG TAGTGGAAGG 

3051 TAATTTCACC CTGCAAAACG 

3101 AACTCATCCG CAAAGACGGC 

3151 CAAGAGCTTT CCGACAAACT 

3201 AAAAGACAAC GCGCAAAGCC 

3251 CCGCCGAAAA GACAGAAAGC 

3301 GAAAATGTCG GCATTATGCA 

3351 GGATAAAGAC AGCGCNTTGG 

3401 NTACCACCGC CTTCCCCCGC 

3451 CCGCAGCCCC AACCGCAACC 

3501 CCGTTATGCC AATAGCGGTT 

3551 TTTTCGCCGT ACAGGACGAA 



GACAACCGAA ACACACCGCA AAGCCCCGAA 
CGCCTGCTTA CTTAGCCATA TGCCTGTCGT 
TGGGCGGGAC ACACTTATTT CGGCATCAAC 
TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 
ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 
ATTGATTTTT CTGTGGTGTC GCGTAACGGC 
TCAATATATT GTGAGCGTGG CACATAACGG 
TTGGTGCGGA AGGAAGNAAT CCCGATCAGC 
GTGAAAAGAA ATAATTATAA GCCTGACAAT 
TTANCATATG CCGCGTTTGC ATAAATTTGT 
AAATGACGAG TGACATGAGG GGGAATACCT 
CCCGAGCGTG TCCGCATCGG CTCAGGACAC 
TGACAAACAC GGCGATTTAT CCTACTCCGG 
ATACACATAT GCAGGGTTGG GGAAATAATG 
GATGTGCGCC ATGCCAACGA CTATGGCCCT 
AGGCGACAGC GGTTCGCCAA TGTTTATTTA 
GGCTGCTCAA CGGAGTTTTA CAAACCGGCT 
AACGGTTTCC AGCTGATACG CAAAGATTGG 
AGGCGATACA CATACCGTCT NTTTTGAACC 
CCTTTACATC CAACAACAAC GGTACGGGTA 
AAGGTNTCCA ATCCAAAGCT TAAAGTACAG 
ATCTTTGAAT GAAACTGATA AAGAACCAGT 
ATCAGTACCG TCCAAGGTTA AACAACGGTG 
TACGGCAACG GCAAACTCAT CTTATCAAAC 
CGGTTTGTAT TTTGAAGGTG ATTTTACGGT 
CGTGGCAAGG CGCGGGCGTT CATATCAGTG 
AAAGTAAACG GCGTGGCAAA CGACCGCCTG 
GCTGCACGTT CAAGCCAAAG GGGAAAACCA 
ACGGTACAGT CATTTTGGAT CAGCAGGCAG 
GCCTTTAGTG AAATCGGCTT GNTCAGCGGC 
TGCCGATAAT CAGTTCAACC CCGACAAACT 
GACGTTTGGA TTTAAACGGG CATTCGCTTT 
ACCGATGAAG GGGCGATGAT TGNCNATCAT 
CGTTACCATT ACAGGGAATG AAAGTATTAC 
TCAATAGACT TAATTACAGC AAAGAAATTG 
GAGAAAGATA CGACCAAAAC GAACGGGCGG 
CGCCGCAGAA GACCGCACCC NGCTGCTTTC 
GCAACATCAC GCAAACAAAC GGCAAACTGT 
CCGCACGCCT ACAATCATTT AGGAAGCGGG 
CCCACAAGGA GAAATCGTGT GGGACAACGA 
AAGCGGAAAA TTTCCATATT CAGGGCGGGC 
GTTGCCAAAG TGGAAGGCGA TTGNCATTTG 
TTTTGGTGTC GCACCGCATC AAAGCCATAC 
GGACNGGTCT GACAAATTGT GTCGAANAAA 
ATTGCTTCAT TGACTAAGAC NGACNTNAGC 
TNACGNTNNT TNAAANCTCN CNGGGCNTGC 
GTGCAAATGG CGATACACGT TATACAGTCA 
GGCAACCTTA GCCTCGTGGG CAATGCCCAA 
ATTAAACGGC AACNCATCGG NTTCGGGCAA 
ACAACGCCGC ACAAAACGGC AGTCTGACGC 
AACGTAAGCC ATTCCGCACT CAACGGCAAT 
AGTATTCCAT TTTGAAAACA GCCGCTTTAC 
AGGANACAGC ATTACACTTA AAAGACAGCG 
ACGGAATTAG GCAATTTAAA CCTTGACAAC 
CGCCTATCGC CACGATGCTG CAGGCGCGCA 
CGCCGCGCCG CCGTTCGCGC CGTTCCCTAT 
TCGGTAGAAT CCCGTTTCAA CACGCTGACG 
TCAAGGAACA TTCCGCTTTA TGTCGGAACT 
AATTGAAGCT GGCGGAAAGT TCCGAAGGNA 
AATACCGGCA ACGAACCCGT AAGCCTCGAT 
GAAAGACAAC AAACCGCTGT CCGAAAACCT 
AACACGTCGA TGCCGGCGCG TGGCGTTACC 
GAGTTCCGCC TGCATAATCC GGTCAAAGAA 
CGGCAAGGCA GAAGCCAAAA AACAGGCGGA 
TTGACGCGCT GATTGCGGCC GGGCGCGATG 
GTTGCCGAAC CGGCCCGGCN GGCAGGCGGG 
GGCGGAGGAA GAGAAAAAAC GGGTGCAGGC 
CGAAACAGCG CGAAGCGGAA ACCCGGCCGG 
GCCCGCNGCG CCCGCCGGGA TTTGCCGCAA 
TCAACCCCAA CCGCAGCGCG ACCTGATNAG 
TGAGTGAATT TTCCGCCACG CTCAACAGCG 
TTGGACCGCG TGTTTGCCGA AGACCGCCGC 
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3601 AACGCNGTTT GGACAAGCNG 

3 651 AGATTTCCGC GCCTACCGCC 

37 01 AGAAAAACCT CGGCAGCGGG 
3751 ACCGAAAACA NCTTCGACGA 

38 01 CGGCGCCGTT TTCGGGCAAT 
3851 GCACGGGCGC GGGTTTTAGC 
3901 AAAATCCGCC GCCGCGTGCT 
3 951 CGGTTTCGGC GGATTCGGCA 
4001 TCGTCCAAAA AGCGGATTAC 
4051 GGTCTTGCGT TCAACCGNTA 
4101 CAAACCGGCG CAACACATNT 
4151 CCGATGCCGC TTCGGGCAAA 
4201 GCTCAGGATT TCGGCAAAAC 
4251 AATCAAAGGT TTCACGCTGT 
4301 AACTGGAAGC GCAACACAGC 

This encodes a protein having amino acid sequence <SEQ ID 652>: 

1 MKTTDKRTTE THRKAPKTGR IRFSPAYLAI CLSFGIL PQA WAGHTYFGIN 

51 YQYYRDFAEN KGKFAVGAKD IEVYNKKGEL VGKSMTKAPM IDFSWSRNG 

101 VAALVGDQYI VSVAHNGGYN NVDFGAEGXN PDQHRFSYQI VKRNNYKPDN 

151 SHPYNGDXHM PRLHKFVTDA EPVEMTSDMR GNTYSDKEKY PERVRIGSGH 

2 01 HYWRYDDDKH GDLSYSGAWL IGGNTHMQGW GNNGVXSLSG DVRHANDYGP 
251 MPIAGAAGDS GSPMFIYDKT NNKWLLNGVL QTGYPYSGRE NGFQLIRKDW 

3 01 FYDDIYRGDT HTVXFEPRSN GHFSFTSNNN GTGTVTETNE KVSNPKLKVQ 
351 TVRLFDESLN ETDKEPVYAA GGVNQYRPRL NNGENLSFID YGNGKLILSN 

4 01 NINQGAGGLY FEGDFTVSPE NNETWQGAGV HISEDSTVTW KVNGVANDRL 
451 SKIGKGTLHV QAKGENQGSI SVGDGTVILD QQADDKGKKQ AFSEIGLXSG 
501 RGTVQLNADN QFNPDKLYFG FRGGRLDLNG HSLSFHRIQN TDEGAMIXXH 
551 NATTTSTVTI TGNESITQPS GKNINRLNYS KEIAYNGWFG EKDTTKTNGR 
601 LNLVYQPAAE DRTXLLSGGT NLNGNITQTN GKLFFSGRPT PHAYNHLGSG 
651 WSKMEGIPQG EIVWDNDWIX RTFXAENFHI QGGQAVISRN VAKVEGDXHL 
7 01 SNHAQAVFGV APHQSHTICT RSDWTGLTNC VEXXITDDKV IASLTKTDXS 
7 51 GXVXLXXXXX XXLXGXAXLX GNLSANGDTR YTVSHNATQN GNLSLVGNAQ 
801 ATFNQATLNG NXSXSGNASF NLSNNAAQNG SLTLSDNAKA NVSHSALNGN 
851 VSLADKAVFH FENSRFTGQL SGSKXTALHL KDSEWTLPSG TELGNLNLDN 
901 ATITLNSAYR HDAAGAQTGX VSDTPRRRSR RSLLSVTPPT SVESRFNTLT 
951 VNGKLNXQGT FRFMSELFGY RSDKLKLAES SEGTYTLAVN NTGNEPVSLD 

iOOl QLTWEGKDN KPLSENLNFT LQNSHVDAGA KRYQLIRKDG EFRLHNPVKE 

1051 QELSDKLGKA EAKKQAEKDN AQSLDALIAA GRDAAEKTES VAEPARXAGG 

1101 ENVGIMQAEE EKKRVQADKD SALAKQREAE TRPXTTAFPR ARXARRDLPQ 

1151 PQPQPQPQPQ PQRDLXSRYA NSGLSEFSAT LNSVFAVQDE LDRVFAEDRR 

1201 NAVWTSXIRX TKHYRSQDFR AYRQQTDLRQ IGMQKNLGSG RVGILFSHNR 

1251 TENXFDDGIG NSARLAHGAV FGQYGIGRFD IGISTGAGFS SGXLSDGIGG 

1301 KIRRRVLHYG IQARYRAGFG GFGIEPYIGA TRYFVQKADY RYENVNIATP 

1351 GLAFNRYRAG IKADYSFKPA QHXSITPYXS LSYTDAASGK VRTRVNTAVL 

1401 AQDFGKTRSA EWGVNAE IKG FTLSXHAAAA KGPQLEAQHS AGIKLGYRW* 



A transmembrane region is underlined. 



CATCCGGNAC ACCAAACACT ACCGTTCGCA 
AACAAACCGA CCTGCGCCAA ATCGGTATGC 
CGCGTCGGCA TCCTGTTTTC GCACAACCGG 
CGGCATCGGC AACTCGGCAC GGCTTGCCCA 
ACGGCATCGG CAGGTTCGAC AT CGG CAT CA 
AGCGGCANTC TNTCAGACGG CATCGGAGGC 
GCATTACGGC ATTCAGGCAC GATACCGCGC 
TCGAACCGTA CATCGGCGCA ACGCGCTATT 
CGCTACGAAA ACGTCAATAT CGCCACCCCC 
CCGNGCGGGC ATTAAGGCAG ATTATTCATT 
CCATCACNCC TTATTTNAGC CTGTCCTATA 
GTCCGAACAC GCGTCAATAC CGCNGTATTG 
CCGCAGTGCG GAATGGGGCG TAAACGCCGA 
CCNTCCACGC TGCCGCCGCC AAAGGNCCGC 
GCGGGCATCA AATTAGGCTA CCGCTGGTAA 



ORF1-1 shows 86.3% identity over a 1462aa overlap with ORFla: 

10 20 30 40 50 60 

orf la . pep MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQAWAGHTYFG IN YQYYRDFAEN 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I 

orf 1-1 MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQAWAGHTYFG IN YQYYRDFAEN 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf la . pep KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGDQYI VSVAHNGGYN 

orf 1-1 KGKFAVGAKDIEVYNKKGELVGKSMTK^PMIDFSWSRNGVAALVGDQYIVSVAHNGGYN 
70 80 90 100 110 120 

130 140 150 160 170 179 

orf la . pep NVDFGAEGXNPDQHRFSYQIVKRNNYKPDNS-HPYNGDXHMPRLHKFVTDAEPVEMTSDM 

orf 1-1 NVDFGAEGRNPDQHRFTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 
130 140 150 160 170 180 
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orf la. pep RGNTYSDKEKYPERVRIG3GHHYWRYDDDKHGDL — SY, 

orfl-1 DGRKYIDQNNYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAY3WLVGGNTFAQNGSGG 
190 200 210 220 230 240 



orf la. pep 
orfl-1 



GTVNLGSEKIKHS-PYGFLPTGGSFGDSG5PMFIYDAQKQKWLINGVLQTGNPYIGKSNG 



orf la. pep 
orfl-1 



FQLIRKDWFYDDIYRGDTHTVXFEPRSNGHFSFTSNNNGTGTVTETNEKVSNP-KLKVQT 
FQLVRKDWFYDEIFAGDTHSVFYEPRQNGKYSFNDDNNGTGKINAKHEHNSLPNRLKTRT 



orf la. pep 
orfl-1 



VRLFDESLNETDKEPVY-AAGGVKQYRPRLNNGENLSFIDYGNGKLILSNNINQGAGGLY 
VQLFNV3LSETARE PVYHAAGGVNSYRPRLNNGENISFIDEGKGELILTSNINQGAGGLY 



orf la. pep 
orfl-1 



FEGDFTVSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTLHVQAKGENQGSI 
FQGDFTVSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTLHVQAKGENQGSI 



480 490 500 510 520 530 

orf la. pep SVGDGTVILDQQADDKGKKQAFSEIGLXSGRGTVQLNADNQFNPDKI/YFGFRGGRLDLNG 

orfl-1 SVGDGTVILDQQADDKGKKQAFSEIGLVSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNG 



orfla.pep 
orfl-1 



HSI.SFHRIQNTDEGAMTXXHNATTTSTVTITGNES TTQPSGKNINRLNYSKEIAYNGWFG 



orfla.pep 
orfl-1 



orfla.pep 
orfl-1 



WSKMEGIPQGEIVWDNDWIXRTFKAENFHIQGGQAVISRNVAKVEGDXHLSNHAQAVFGV 



orfla.pep 
orfl-1 



orfla.pep 
orfl-1 



GNL3ANGDTRYTVSHNATQKGNLSLVC-NAQATFNQATLNGNTSASGNASFNLSDHAVQNG 



orfla.pep 
orfl-1 



SLTLSDNAKANVSH5ALNGNVSLADKAVFHFENSRFTGQLSGSKXTALHLKDSEWTLPSG 
SLTLSGNAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWTLPSG 
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orf la. pep 
orfl-1 



TELGNLNLDNATITLNSAYRHDAAGAQTGXVSOTPRRRSRRS-- 
I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I : : I : I I I I I I I I 
TELGNLNLDNATITLNSAYRHDAAGAQTGSAT OAPRRRSRRSRRSLLSVTPPTSVESRFN 



1010 1020 1030 1040 1050 1060 

orf la. pep KDNKPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAEAKKQAE 
I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl-1 KDNKPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAEAKKQAE 

1020 1030 1040 1050 1060 1070 

orf la. pep 
orfl-1 



orf la. pep 
orfl-1 



orf la. pep 
orfl-1 



orf la. pep 
orfl-1 



orf la. pep 
orfl-1 



1190 1200 1210 1220 1230 1240 

QDELDRVFAEDRRNAVWTSXIRXTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFS 

QDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFS 
1200 1210 1220 1230 1240 1250 

1250 1260 1270 1280 1290 1300 

HNRTENXFDDGIGNSARLAHGAVFGQYGIGRFDIGISTGAGFSSGXLSDGIGGKIRRRVL 

I I I I I I : I I I I I I I I I I I I I I I I I I I I I I II 1111:1111111 I I I I II I I I I I I I I 
HNRTENTFDDGIGNSARLAHGAVFGQYGIDRFYIGISAGAGFSSGSLSDGIGGKIRRRVL 
1260 1270 1280 1290 1300 1310 

1310 



orf la. pep 
orfl-1 



orf la. pep 
orfl-1 



1370 1380 1390 1400 1410 1420 

KPAQHXSITPYXSLSYTDAASGKVRTRVKTAVLAQDFGKTRSAEWGVNAEIKGFTLSXHA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II 
KPAQHISITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEIKGFTLSLHA 
1380 1390 1400 1410 1420 1430 

1430 



Homology with adhesion and penetration protein hap precursor of H.influenzae (accession number P45387) 
Amino acids 23-423 of ORF 1 show 59% aa identity with hap protein in 450aa overlap: 

orfl 
hap 



FXAAYLAICLSFGILPQAWAGHTYFGINYQYYRDFAENKGKFAVGAKDIEVYNKKGELVG 82 
F +L C+S GI QAWAGHTYFGI+YQYYRDFAENKGKF VGAK+IEVYNK+G+LVG 
FRLNFLTACVSLGIASQAWAGHTYFGIDYQYYRDFAENKGKFTVGAKNIEVYNKEGQLVG 65 



orfl 83 KSMTKAPMIDFSWSRNGVAALVGVQYIVSVAHNGGYNNVDFGAEGXNIXDQXRXTYKIV 142 

SMTKAPMIDFSWSRNGVAALVG QYIVSVAHNGGYN+VDFGAEG N DQ R TY+IV 
hap 66 TSMTKAPMIDFSWSRNGVAALVGDQYIVSVAHNGGYNDVDFGAEGRN-PDQHRFTYQIV 124 
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orfl 


143 


hap 


125 


orfl 


203 


hap 


185 


orfl 


223 


hap 


245 


orfl 


278 


hap 


305 


orfl 


335 




364 


orfl 


394 




424 


no acids 715- 


Orfl 


41 


hap 


733 


orfl 


99 




793 


orfl 


159 


hap 


853 


orfl 


219 




900 


orfl 


279 


hap 


960 



■ HPY GDYHMPRLHK VT+AEPV MT+ MDG+ Y D+ NYP+RVRIG+GR 



] Y+PR+ G+NI D+GKG L + +NINQGAGGLYF+G+F V +NN TWQGA 



GV I +D+TV WKV+ NDRLSKIG GTL 



TQ NG+ +L NA + A LNGN + ++ F LS++A Q G++ LS 



L L+N+T+TLNSAY 



GKL+GQGTF+F S LFGY+S DKLKL+ +EG YTL+V NTG EP +LEQLT++E DNKP 
GKLSGQGTFQFTSSLFGYKSDKLKLSNDAEGDYTLSVRNTGKEPVTLEQLTLIESLDNKP 959 

LSENLNFTLQNEHVDAGA 2 96 
LS+ L FTL+N+HVDAGA 
I LSDKLKFTLENDHVDAGA 977 

45 Amino acids 1 192-1450 of ORF1 show 41% aa identity with hap protein in 259aa overlap: 



Orfl 


1 


hap 


1135 


orfl. 


61 


hap 


1195 


orfl 


121 


hap 


1255 


orfl 


181 


hap 


1315 


orfl 


241 




1375 
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Homology with a predicted ORF from N. gonorrhoeae 

The blocks of ORF1 show 83.5%, 88.3%, and 97.7% identities in 467, 298, and 259 aa overlap, 
respectively with a predicted ORF (ORFlng) from N. gonorrhoeae: 



orfl.pep 

orf 1 .pep 
orf lng 
orfl.pep 
orf lng 
orfl.pep 

orf 1 .pep 
orf lng 
orf 1 .pep 

orf 1 . pep 
orf lng 
orf 1 . pep 
orf lng 
orf 1 . pep 

orfl.pep 

orfl.pep 
orf lng 
orfl.pep 

orfl.pep 

orfl.pep 
orf lng 
orfl.pep 
orf lng 
orfl.pep 
orf lng 



MKTTDKRTTETHRKAPKTGRIRFXAAYLAICLSFGILPQAWAGHTYFGINYQYYRDFAEN 60 

MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQARAGHTYFGINYQYYRDFAEN 60 

KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGVQYIVSVAHNGGYN 120 

I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I 

KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALAGDQYIVSVAHNGGYN 120 

NVDFGAEGXNIXDQXRXTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKXVTDAEPVEMTSY 180 

I I I I I I I I I III :l:|llllllllll:|llllllllllll Ill 

NVDFGAEGSN-PDQHRFSYQIVKRNNYKAGTNGHPYGGDYHMPRLHKFVTDAEPVEMTSY 17 9 



MDGWKYADLNKYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSG 239 

GSPMFIYDA QKQKWL IN GVLOTGNPYIGI 

II I I I I I I I I I I I I I I I 

GGTWLGSEKIKHSPY GFLPTGGSFGDSGSPMFIYDA QKQKWLING\ 

FOLVRKDWFYDEIFAGDTHSVFYEPRONGKYSFHDDNNGTGKINAKHEHNSLPNRLKTRT 315 
I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I : I I I : I I I : I I I : I III I I I I I I 
FOLVRKDWFYDEIFAGDTHSVFYEPHONGKYFFNDNNNGAGKIDAKHKHYSLPYRLKTRT 359 

VQLFNVSLSETAREPVYHAAGGVNSYRPRLNNGENISFIDEGKGELILTSNINQGAGGLY 375 

I I MIIM : MINIMI 

VQLFNVSLSETAREPVYHAAGGVNSYRPRLNNGENISFIDKGKGELILTSNINQGAGGLY 

FQGDFTVSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGT 422 
I : I : I I I I I : I II I I I II I II I I : I I I I I I I I I II I I I I 

FEGNFTVSPKKNETWQGAGVHISDGSTVTWKVNGVANDRLSKIGKGTLLVQAKGENQGSV 479 
// 

DKVTASLTKTDISGNVDLADHAHLNLTGLA 744 

Ml I I I M I I : 111:1 I 

FGVAPHQSHTICTRSDWTGLTSCTEKTITDDKVIASLSKTDVRGNVSLADHAHLNLTGLA 774 

TLNGNLSANGDTR-YTVSHNATQNGNX3LVXNAQATFNQATLNGNTSASGNASFNLSDHA 803 

I : I I I I :::MI : I I I I I I I II lllllll::| 

TFNGNL-VQAETRTIRLRANATQNGNLSLVGNAQATFNQATLNGNTSASDNASFNLSNNA 833 

VQNGSLTLSGNAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWT 863 

I II I I II II II I I II I I I II I I I I II I I I I : I I I I I : I I I I I I I I I 

VQNGSLTLSDNAKANVSHSALNGNVSLADKAVFHFENSRFTGKISGGKDTALHLKDSEWT 893 

LPSGXELGNLNLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLXVTPPTSVE 923 

I I I I : I I I I I I I I I I I II I II I I II 111:11111 111:1 

LPSGTELGNLNLDNATITLNSAYRHDAAGAQTGSAADAPRRRSRRS LLSVTPPTSAE 950 

SRFNTLTVNGKLNGQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPASLEQLT 983 

II I I I I II I II I I : I I I I I I 

SRFNTLTVNGKLNGQGTFRFMSELFGYRSGKLKLAESSEGTYTLAVNNTGNEPVSLEQLT 1010 

WEGKDNKPLSENLNFTLQNEHVDAGAW 1011 

II I II II I I I I I II 

WEGKDNTPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAGET 1070 

// 

LDRVFAEDRRNAVWTSGIRDTKHYRSQDFR 1211 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
PQRDLISRYANSGLSEFSATLNSVFAVQDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFR 1239 

AYRQQTDLRQIGMQKNLGSGRVGILFSHNRTENTFDDGIGNSARLAHGAVFGQYGIDRFY 1271 

I I II I I I I I I I I I I I I I I I 

AYRQQTDLRQIGMQKNLGSGRVGILFSHNRTGNTFDDGIGNSARLAHGAVFGQYGIGRFD 1299 
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orfl.pep IGISAGAGFSSGSLSDGIGXKXRRRVLHYGIQARYRAGFGGFGIEPHIGATRYFVQKADY 1331 

orflng IGISAGAGFSSGSLSDGIRGKIRRRVLHYGIQARYPAGFGGFGIEPHIGATRYFVQKADY 1359 

orf 1 .pep RYENVNIATPGLAFNRYRAGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVL 13 91 

orflng' RYENVNIATPGLAFNRYRAGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVL 1419 

orfl.pep AQDFGKTRSAEWGVNAEIKGFTLSLHAAAAKGPQLEAQHSAGIKLGYRW 14 40 

orflng AQDFGKTRSAEWGTOAEIKGFTLSLHAAAAKGPQLEAQHSAGIKLGYRW 14 68 

The complete length ORFlng nucleotide sequence was identified <SEQ ID 653>: 

1 ATGAAAACAA CCGACAAACG GACAACCGAA ACACACCGCA AAGCCCCTAA 

51 AACCGGCCGC ATCCGCTTCT CGCCCGCTTA CTTAGCCATA TGCCTGTCGT 

101 TCGGCATTCT GCCCCAAGCC CGGGCGGGAC ACACTTATTT CGGCATCAAC 

151 TACCAATACT ATCGCGACTT TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 

201 GGCGAAAGAT ATTGAGGTTT ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 

251 CGATGACGAA AGCCCCGATG ATTGATTTTT CTGTGGTATC GCGTAACGGC 

301 GTGGCGGCAT TGGCGGGCGA TCAATATATT GTGAGCGTGG CACATAACGG 

351 CGGCTATAAC AATGTTGATT TTGGTGCGGA GGGAAGCAAT CCCGATCAGC 

4 01 ACCGCTTTTC TTACCAAATT GTGAAAAGAA ATAATTATAA AGCAGGGACT 

451 AACGGCCATC CTTATGGCGG CGATTATCAT ATGCCGCGTT TGCACAAATT 

501 TGTCACAGAT GCAGAACCTG TTGAGATGAC CAGTTATATG GATGGGTGGA 

551 AATACGCTGA TTTAAATAAA TACCCTGATC GTGTTCGAAT CGGAGCAGGC 

601 AGACAATATT GGCGGTCTGA TGAAGACGAA CCCAATAACC GCGAAAGTTC 

651 ATATCATATT GCAAGCGCAT ATTCTTGGCT CGTCGGTGGC AATACCTTTG 

701 CACAAAATGG ATCAGGTGGT GGCACAGTCA ACTTAGGTAG CGAAAAAATT 

751 AAACATAGCC CATATGGTTT TTTACCAACA GGAGGCTCAT TTGGCGACAG 

801 TGGCTCACCA ATGTTTATCT ATGATGCCCA AAAGCAAAAG TGGTTAATTA 

851 ATGGGGTATT GCAAACAGGC AACCCCTATA TAGGAAAAAG CAATGGCTTC 

901 CAGCTAGTTC GTAAAGATTG GTTCTATGAT GAAATCTTTG CTGGAGATAC 

951 CCATTCAGTA TTCTACGAAC CACATCAAAA TGGGAAATAC TTTTTTAACG 

1001 ACAATAATAA TGGCGCAGGA AAAATCGATG CCAAACATAA ACACTATTCT 

1051 CTACCTTATA GATTAAAAAC ACGAACCGTT CAATTGTTTA ATGTTTCTTT 

1101 ATCCGAGACA GCAAGAGAAC CTGTTTATCA TGCTGCAGGT GGGGTCAACA 

1151 GTTATCGACC CAGACTGAAT AATGGAGAAA ATATTTCCTT TATTGACAAA 

1201 GGAAAAGGTG AATTGATACT TACCAGCAAC AT CAACCAAG GCGCGGGCGG 

1251 TTTGTATTTT GAGGGTAATT TTACGGTCTC GCCTAAAAAC AACGAAACGT 

1301 GGCAAGGCGC GGGCGTTCAT ATCAGTGATG GCAGTACCGT TACTTGGAAA 

1351 GTAAACGGCG TGGCAAACGA CCGCCTGTCC AAAATCGGCA AAGGCACGCT 

1401 GCTGGTTCAA GCCAAAGGGG AAAACCAAGG CTCGGTCAGC GTGGGCGACG 

1451 GTAAAGTCAT CTTAGATCAG CAGGCGGACG ATCAAGGCAA AAAACAAGCC 

1501 TTTAGTGAAA TCGGCTTGGT CAGCGGCAGG GGGACGGTGC AACTGAATGC 

1551 CGATAATCAG TTCAACCCCG ACAAACTCTA TTTCGGCTTT CGCGGCGGAC 

1601 GTTTGGATTT GAACGGGCAT TCGCTTTCGT TCCACCGCAT TCAAAATACC 

1651 GATGAAGGGG CGATGATTGT CAACCACAAT CAAGACAAAG AATCCACCGT 

1701 TACCATTACA GGCAATAAAG ATATTACTAC AACCGGCAAT AACAACAACT 

1751 TGGATAGCAA AAAAGAAATT GCCTACAACG GTTGGTTTGG CGAGAAAGAT 

1801 GCAACCAAAA CGAACGGGCG GCTCAATCTG AATTACCAAC CGGAAGAAGC 

1851 GGATCGCACT TTACTGCTTT CCGGCGGAAC AAATTTAAAC GGCAATATCA 

1901 CGCAAACAAA CGGCAAACTG TTTTTCAGCG GCAGACCGAC ACCGCACGCC 

1951 TACAATCATT TAGGAAGCGG GTGGTCAAAA ATGGAAGGTA TCCCACAAGG 

2 001 AGAAATCGTG TGGGACAACG ATTGGATCGA CCGCACATTT AAAGCGGAAA 

2051 ACTTCCATAT TCAGGGCGGA CAAGCGGTGG TTTCCCGCAA TGTTGCCAAA 

2101 GTGGAAGGCG ATTGGCATTT AAGCAATCAC GCCCAAGCAG TTTTCGGTGT 

2151 CGCACCGCAT CAAAGCCACA CAATCTGTAC ACGTTCGGAC TGGACGGGTC 

2201 TGACAAGTTG TACCGAAAAA ACCATTACCG ACGATAAAGT GATTGCTTCA 

2251 TTGAGCAAGA CCGACATCAG AGGCAATGTC AGCCTTGCCG ATCACGCTCA 

2301 TTTAAATCTC ACAGGACTTG CCACACTCAA CGGCAATCTT AGTGCAGGCG 

2351 GAGACACGCA CTATACGGTT ACGCGCAACG CCACCCAAAA CGGCAACCTC 

2401 AGCCTCGTGG GCAATGCCCA AGCAACATTT AATCAAGCCA CATTAAACGG 

2451 CAACACATCG GCTTCGGACA ATGCTTCATT TAATCTAAGC AACAACGCCG 

2501 TACAAAACGG CAGTCTGACG CTTTCCGACA ACGCTAAGGC AAACGTAAGC 

2551 CATTCCGCAC TCAACGGCAA TGTCTCCCTA GCCGATAAGG CAGTATTCCA 

2601 TTTTGAAAAC AGCCGCTTTA CCGGAAAAAT CAGCGGCGGC AAGGATACGG 

2651 CATTACACTT AAAAGACAGC GAATGGACGC TGCCGTCGGG CACGGAATTA 

2701 GGCAATTTAA ACCTTGACAA CGCCACCATT ACACTCAATT CCGCCTATCG 

2751 ACACGATGCG GCAGGCGCGC AAACCGGCAG TGCGGCAGAT GCGCCGCGCC 

2801 GCCGTTCGCG CCGTTCCCTA TTATCCGTTA CGCCGCCAAC TTCGGCAGAA 

2851 TCCCGTTTCA ACACGCTGAC GGTAAACGGC AAATTGAACG GTCAGGGAAC 
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2901 ATTCCGCTTT ATGTCGGAAC TCTTCGGCTA CCGCAGCGGC AAATTGAAGC 

2951 TGGCGGAAAG TTCCGAAGGC ACTTACACCT TGGCTGTCAA CAATACCGGC 

3001 AACGAACCCG TAAGTCTCGA GCAATTGACG GTAGTGGAAG GAAAAGACAA 

3051 CACACCGCTG TCCGAAAATC TTAATTTCAC CCTGCaaaAc gaacacgtcg 

3101 atgccggcgc atggCGTTAT CAGCTTATCC gcaaagacgG CGAGTTCCgc 

3151 CTGCATAATC CGGTCAAAGA ACAAGAGCTT TCCGACAAAC TCGGCAAGgc 

3201 gggagaaACA GAggccgccT TGACGGCAAA ACAGGCacaA CTTGCCGCCA 

3251 AAcaacaggc ggaaaAAGAC AACgcgcaaa gccttgAcgc gctgattgcg 

3301 gCcgggcgca atgccaccga AAAGGCAgaa agtgttgccg aaccgGCCCG 

3351 GCAGGCAGGC GGGGAAAAtg ccgGCATTAT GCAGGCGGAG GAAGAGAAAA 

3401 AACGGGTGCA GGCGGATAAA GACACCGCCT TGGCGAAACA GCGCGAAGCG 

3451 GAAACCCGGC CGGCTACCAC CGCCTTCCCC CGCGCCCGCC GCGCCCGCCG 

3501 GGATTTGCCG CAACCGCAGC CCCAACCGCA ACCCCAACCG CAGCGCGACC 

3551 TGATCAGCCG TTATGCCAAT AGCGGTTTGA GTGAATTTTC CGCCACGCTC 

3601 AACAGCGTTT TCGCCGTACA GGACGAATTG GACCGCGTGT TTGCCGAAGA 

3651 CCGCCGCAAC GCCGTTTGGA CAAGCGGCAT CCGGGACACC AAACACTACC 

3701 GTTCGCAAGA TTTCCGCGCC TACCGCCAAC AAACCGACCT GCGCCAAATC 

3751 GGTATGCAGA AAAACCTCGG CAGCGGGCGC GTCGGCATCC TGTTTTCGCA 

38 01 CAACCGGACC GGAAACACCT TCGACGACGG CATCGGCAAC TCGGCACGGC 

3851 TTGCCCACGG TGCCGTTTTC GGGCAATACG GCATCGGCAG GTTCGACATC 

3901 GGCATCAGCG CGGGCGCGGG TTTTAGTAGC GGCAGCCTTT CAGACGGCAT 

3951 CAGAGGCAAA ATCCGCCGCC GCGTGCTGCA TTACGGCATT CAGGCAAGAT 

4001 ACCGCGCAGG TTTCGGCGGA TTCGGCATCG AACCGCACAT CGGCGCAACG 

4051 CGCTATTTCG TCCAAAAAGC GGATTACCGA TACGAAAACG TCAATATCGC 

4101 CACCCCGGGC CTTGCATTCA ACCGCTACCG CGCGGGCATT AAGGCAGATT 

4151 ATTCATTCAA ACCGGCGCAA CACATTTCCA TCACGCCTTA TTTGAGCCTG 

4201 TCCTATACCG ATGCCGCTTC CGGCAAAGTC CGAACGCGCG TCAATACCGC 

42 51 CGTATTGGCG CAGGATTTCG GCAAAACCCG CAGTGCGGAA TGGGGCGTAA 

4 301 ACGCCGAAAT CAAAGGTTTC ACGCTGTCCC TCCACGCTGC CGCCGCCAAG 

4 351 GGGCCGCAAT TGGAAGCGCA GCACAGCGCG GGCATCAAAT TAGGCTACCG 

4 401 CTGGTAA 

This is predicted to encode a protein having amino acid sequence <SEQ ID 654>: 

1 MKTTDKRTTE THRKAPKTGR IRFSPAYLAI CLSFGILPQA RAGHTYFGIN 

51 YQYYRDFAEN KGKFAVGAKD IEVYNKKGEL VGKSMTKAPM IDFSWSRNG 

101 VAALAGDQYI VSVAHNGGYN NVDFGAEGSN PDQHRFSYQI VKRNNYKAGT 

151 NGHPYGGDYH MPRLHKFVTD AEPVEMTSYM DGWKYADLNK YPDRVRIGAG 

201 RQYWRSDEDE PNNRESSYHI ASAYSWLVGG NTFAQNGSGG GTVNLGSEKI 

251 KHSPYGFLPT 3GSFGC 31 1 2K2? WLIN GVLOTG NPYIGKSNGF 

301 QLVRKDWFYD EIFAGDTHSV FYEPHQNGKY FFNDNNNGAG KIDAKHKHYS 

351 LPYRLKTRTV QLFNVSLSET AREPVYHAAG GVNSYRPRLN NGENISFIDK 

401 GKGELILTSN INQGAGGLYF EGNFTVSPKN NETWQGAGVH ISDGSTVTWK 

4 51 VNGVANDRLS KIGKGTLLVQ AKGENQGSVS VGDGKVILDQ QADDQGKKQA 

501 FSEIGLVSGR GTVQLNADNQ FNPDKLYFGF RGGRLDLNGH SLSFHRIQNT 

551 DEGAMIVNHN QDKESTVTIT GNKDITTTGN NNNLDSKKEI AYNGWFGEKD 

601 ATKTNGGLNL NYPPEEADRT LLLSGGTNLN GNITQTNGKL FFSGRPTPHA 

651 YNHLGSGWSK MEGIPQGEIV WDNDWIDRTF KAENFHIQGG QAWSRNVAK 

7 01 VEGDWHLSNH AQAVFGVAPH QSHTICTRSD WTGLTSCTEK TITDDKVIAS 

751 LSKTDVRGNV SLADHAHLNL TC-LATFNGNL VQAETRTIRL RANATQNGNL 

801 SLVGNAQATF NQATLNGNTS ASDNASFNLS NNAVQNGSLT LSDNAKANVS 

851 HSALNGNVSL ADKAVFHFEN SRFTGKISGG KDTALHLKDS EWTLPSGTEL 

901 GNLNLDNATI TLNSAYRHDA AGAQTGSAAD APRRRSRRSL LSVTPPTSAE 

951 SRFNTLTVNG KLNGQGTFRF MSELFGYRSG KLKLAESSEG TYTLAVNNTG 

1001 NEPVSLEQLT WEGKDNTPL SENLNFTLQN EHVDAGAWRY QLIRKDGEFR 

1051 LHNPVKEQEL SDKLGKAGET EAALTAKQAQ LAAKQQAEKD NAQSLDALIA 

1101 AGRNATEKAE SVAEPARQAG GENAGIMQAE EEKKRVQADK DTALAKQREA 

1151 ETRPATTAFP RARRARRDLP QPQPQPQPQP QRDLISRYAN SGLSEFSATL 

1201 NSVFAVQDEL DRVFAEDRRN AVWTSGIRDT KHYRSQDFRA YRQQTDLRQI 

1251 GMQKNLGSGR VGILFSHNRT GNTFDDGIGN SARLAHGAVF GQYGIGRFDI 

1301 GISAGAGFSS GSLSDGIRGK IRRRVLHYGI QARYRAGFGG FGIEPHIGAT 

1351 RYFVQKADYR YENVNIATPG LAFNRYRAGI KADYSFKPAQ HISITPYLSL 

14 01 SYTDAASGKV RTRVNTAVLA QDFGKTRSAE WGVNAEIKGF TLSLHAAAAK 

1451 GPQLEAQHSA GIKLGYRW* 

Underlined and double-underlined sequences represent the active site of a serine protease (trypsin 
family) and an ATP/GTP-binding site motif A (P-loop). 



65 ORF1-1 and ORFlng show 93.7% identity in 1471 aa overlap: 
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orfl-l.pep 
orflng-1 



MKTTDKRTTETHRKAPKTGRI RFS PAYLAICLS FG I LPQAWAGHT YFGINYQYYRDFAEN 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

MKTTDKRTTETHRKAPKTGRI RFS PAYLAICLS FGILPQARAGHTYFGINYQYYRDFAEN 



orfl-l.pep 
orflng-1 



orfl-l.pep 
orflng-1 



orfl-l.pep 
orflng-1 



orf 1-1 .pep 
orflng-1 



orfl-l.pep 
orflng-1 



KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSVVSRNGVAALVGDQYIVSVAHNGGYN 

I HUM: I I I I I I I I I 

KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSVVSRNGVAALAGDQYIVSVAHNGGYN 
70 80 90 100 110 120 

130 140 150 160 170 180 

NVDFGAEGRNPDQHRFTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 

I! I II I I 1:1:1 I I I Ml I h hi M : I I M I I 

NVDFGAEGSNPDQHRFS YQIVKRKNYKAGTNGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 

130 140 150 160 170 180 

190 200 210 220 230 240 

DGRKYI DQNNYPDRVRIGAGRQYWRS DE DEPNNRES S YH I ASAYSWLVGGNT FAQNGSGG 

II II I 1:1111111 I I I I I I I I I I I I I I II I I I I I I I I I I I I I 

DGWKYADLNKYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNT FAQNGSGG 

190 200 210 220 230 240 

250 260 270 280 290 300 

GTVNLGSEKIKHSPYGFLPTGGSFGDSGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGF 
I I I I I I I I II II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I 
GTVNLGSEKIKHSPYGFLPTGGSFGDSGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGF 

250 260 270 280 290 300 



310 



320 



330 



340 



350 



360 



QLVRKDWFYDEIFAGDTHSVFYEPRQNGKYSFNDDNNGTGKINAKHEHNSLPNRLKTRTV 

I I I I II I I I I I I I I I hlllll 111:111:111: I II 

QLVRKDWFYDEIFAGDTHSVFYEPHQNGKYFFNDNNNGAGKIDAKHKHYSLPYRLKTRTV 
310 320 330 340 350 360 



orf 1-1 .pep 

orflng-1 



QLFNVSLSETAREPVYHAAGGVNS YRPRLNNGKNI SFI DEGKGELILTSNINQGAGGLYF 



orfl-l.pep QGDFTVSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTLHVQAKGENQGSIS 

orflng-1 EGNFTVSPKNNETWQGAGVHISDGSTVTWKVNGVANDRLSKIGKGTLLVQAKGENQGSVS 
430 440 450 460 470 480 



orf 1-1 .pep 
orflng-1 



VGDGTVILDQQADDKGKKQAFSEIGLVSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNGH 
VGDGKVILDQQADDQGKKQAFSEIGLVSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNGH 



orf 1-1 .pep 
orflng-1 



SLSFHRIQNTDEGAMIVNHNQDKESTVTITGNKDIATTGNNNSLDSKKEIAYNGWFGEKD 



orf 1-1 . pep TTKTNGRLNLVYQPAAE DRTLLLSGGTNLNGNITQTNGKLFFSGRPT PHAYNHLNDHWSQ 

: I I I I I I I I I Ml I I I I I II II I I I I II II I I I I I II II I I I I I I II I II : : II: 
orflng-1 ATKTNGRLNLNYQPEEADRTLLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLGSGWSK 
610 620 630 640 650 660 



orfl-l.pep 
orflng-1 
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730 740 750 760 770 780 

orfl-l.pep QSHTICTRSDWTGLTNCVEKTITDDKVIASLTKTDISGNVDLADHAHLNLTGLATLNGNL 

I I I I I I I I I I : I : I I I I I : I I I I I I I : I I I I I I I I I INN 

orflng-1 QSHTICTRSDWTGLTSCTEKTITDDKVIASLSKTDIRGNVSLADHAHLNLTGLATLNGNL 

730 740 750 760 770 780 

790 800 810 820 830 840 

orfl-l.pep SANGDTRYTVSHNATQKGNLSLVGNAQATFNQATLNGNTSASGNASFNLSDHAVQNGSLT 

I I : I I I : I I I :: I I I I I I Mil! I I I I I I I :: I I I I ! I I I 

orflng-1 SAGGDTHYTVTRNATQNGNLSLVGNAQATFNQATLNGNTSASDNASFNLSNNAVQNGSLT 

790 800 810 820 830 840 

850 860 870 880 890 900 

orfl-l.pep LSGNAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWTLPSGTEL 

II I I I II I I I I I I I I I I I I I I I I : I I I I I : I I I I I I I I I I I I I II I I I I I I I I I 

orflng-1 LSDNAKANVSHSRLNGNVSLADKAVFHFENSRFTGKISGGKDTALHLKDSEWTLPSGTEL 

850 860 870 880 890 900 

910 920 930 940 950 960 

orfl-l.pep GNLNLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLSVTPPTSVESRFNTLT 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I : I I I I I I I I 
orflng-1 GNLNLDNATITLNSAYRHDAAGAQTGSAADAPRRRSR RSLLSVTPPTSAESRFNTLT 

910 920 930 940 950 

970 980 990 1000 1010 1020 

orfl-l.pep VNGKLNGQGT FRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPASLEQLTWEGKDN 

I I I I I I I I I I I I I I I I I I I : I I I I I I I I 

orflng-1 VNGKLNGQGT FRFMSELFGYRSGKLKLAESSEGTYTLAVNNTGNEPVSLEQLTWEGKDN 

960 970 980 990 1000 1010 

1030 1040 1050 1060 1070 

orfl-l.pep KPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKA 

I I I I I IMIII I I I I I 

orflng-1 tplsenlnftlqnehvdagawryqlirkdgefrlhnpvkeqelsdklgkageteaaltak 
1020 1030 1040 1050 1060 1070 



orfl-l.pep 

orflng-1 



orfl-l.pep 
orflng-1 



orf 1-1 .pep 
orflng-1 



orf 1-1 .pep 
orflng-1 



1310 1320 1330 1340 1350 1360 

or f i-i . pep ggkirrrvlhygiqaryragfggfgiephigatryfvqkadyryenvniatpglafnryr 

orflng-1 RGKIRRRVLHYGIQARYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGLAFNRYR 
1320 1330 1340 1350 1360 1370 



orf 1-1 . pep 
orflng-1 
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In addition, ORFlng shows 55.7% identity with hap protein (P45387) over a 1455aa overlap: 



orflng-l .pep 
p45387 



MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQARAGHTYFGINYQYYRDFAEN 
I : I : I : I : I I : II I I I I I I I I : I I I I I I I I I I 
MKKTVFRLNFLTACISLGIVSQAWAGHTYFGIDYQYYRDFAEN 



orflng-l .pep 
p45387 



KGKFAVGAKD I E VYNKKGE LVGKSMTKAPMI D FSWSRNGVAALAGDQY I VS VAHNGG YN 
KGKFTVGAQNIKVYNKQGQLVGTSMTKAPMIDFSVVSRNGVAALVENQYIVSVAHNVGYT 



orflng-l .pep 
p45387 



NVDFGAEGSNPDQHRFSYQIVKRNNYKAGTNGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 



orflng-l .pep 
p45387 



DGWKYADLNKYPDRVRIGAGRQYKRSDEDSPNNRES3YHIASAYSWLVGGNTFAQNGSGG 
— QVAGAYHYLTAGNTHNQRGAGN 



orflng-l . pep 
p45387 



GTVNLGSEKIKHSPYGFLPTGGSFGDSGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGF 
I II:: I : II II : I I I I I I I I I I I I I : I I I I I I I I : I : III: II II I 
GYSYLGGDVRKAGEYGPLPIAGSKGDSGSPMFIYDAEKQKWLINGILREGNPFEGKENGF 



orflng-l . pep 
p45387 



QLVRKDWFYDEIFAGDTHSVFYEPHQNGKYFFNDNNNGAGKIDAKHKHYSLPYRLKTRTV 



orflng-l. pep QLFNVSLSETAREPVYHAA-GGVNSYRPRLNNGENISFIDKGKGELILTSNINQGAGGLY 
I 1:11 : : I : : I I I I I I I I I I : : I : I : : I I I : : I : I I I I I I I I I 
p45387 TLANMSLPLKEKDKVHNPRYDGPNIYS PRLNNGETLYFMDQKQGSLIFASDINQGAGGLY 



orflng-l. pep FEGN FTVS PKNNETWQGAGVH I SDG ST VTWKVNGVANDRLSKI GKGTLLVQAKGENQGSV 



orflng-l. pep SVGDGKVILDQQADDQGKKQAFSEIGLVSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNG 
I I I I I I I I I : I I I I I I I : I I I I I I I I I I I i I I I I I I I I : I I : I I : I I I I I I I I I I I I I 
p45387 SVGDGKVILEQQADDQGNKQAFSEIGLVSGRGTVQLNDDKQFDTDKFYFGFRGGRLDLNG 
450 460 470 480 490 500 



orflng-l. pep 
p45387 



HSLSFHRIQNTDEGAMIVNHNQDKESTVTITGNKDITT-TGNN-NNLDSKKEIAYNGWFG 
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600 610 620 630 640 650 

orflng-l.pep EKDATKTNGRLNLNYQPEEADRTLLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLGSG 



720 730 740 750 760 770 

.rflng-l.pep APHQSHTICTRSDWTGLTSCTEKTITDDKVIASLSKTDIRGNVSLADHAHLNLTGLATLN 
:|:|::llllllllllll:| : : I 1 ill I = 11 = 1 |:::|:|:| |: III II 
,45387 VPNQQNTICTRSDWTGLTTCQKVDLTDTKVINSIPKTQINGSINLTDNATANVKGLAKLN 
690 700 710 720 730 740 

780 790 800 810 820 830 

.rflng-l.pep GNLSAGGDTHYTVTRNATQNGNLSLVGNAQATFNQATLNGNTSASDNASFNLSNNAVQNG 
II:: :::::|:|||||:l I 

,45387 GNVTL TNHSQFTLSNNATQIG 

750 760 770 

840 850 860 870 880 890 

>rflng-l.pep SLTLSDNAKANVSHSALNGNVSLADKAVFHFENSRFTGKISGGKDTALHLKDSEWTLPSG 

: : Mil: hi::: I II I I I : I : I I : : I I : I : : I : I I I : : I : : : I I : I I 
,45387 NIRLSDNSTATVDNANLNGNVHLTDSAQFSLKNSHFSHQIQGDKGTTVTLENATWTMPSD 
780 790 800 810 820 830 



p4 53B7 TTLQNLTLNNSTITLNSAY S AS SNNT PRRRS — 

840 850 860 

960 970 980 990 1000 1010 

orflng-l.pep VNGKLNGQGT FRFMSELFGYRSGKLKLAESSEGTYTLAVNNTGNE PVS LEQLT WEGKDN 
II II I : I I I M : I I I I II : I I II I : : : : I I I Ml II M II : I I II I : I M II I 
p4 5387 VNGKLSGQGTFQFTSSLFGYKSDKLKLSNDAEGDYILSVRNTGKEPETLEQLTLVESKDN 
880 890 900 910 920 930 

1020 1030 1040 1050 1060 1070 

orflng-l.pep TPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAGETEAALTAK 



1080 1090 1100 1110 1120 1130 

orflng-l.pep QAQLAAKQQAEKDNAQSLDALIAAGRNAT-EKAESVAEPARQAGGENAGIMQAEEEKKRV 



1200 1210 1220 1230 1240 1250 

orflng-l.pep LSEFSATLNSVFAVQDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQ-TDLRQIG 

p4 5387 LSELSATVNSMLSVQDELDRLFVDQAQSAVWTNIAQDKRRYDSDAFRAYQQQKTNLRQIG 
1120 1130 1140 1150 1160 1170 

1260 1270 1280 1290 1300 1310 

orf lng-1 . pep MQKNLGSGRVGILFSHNRTGNTFD3GIGNSARLAHGAVFGQYGIGRFDIGISAGAGFSSG 

p4 5387 VQKALANGRIGAVFSHSRSDNTFDEQVKNHATLTMMSGFAQYQWGDLQFGVNVGTGISAS 
1180 1190 1200 1210 1220 1230 
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orf lng-1 .pep 
p45387 

1380 1390 1400 1410 1420 1430 

orf lng-1. pep AFNRYRAGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEW 

I ::||:| |:::||: II: : : I : I : : : : : I : I 11:11111: : I 

p45387 AFNRYNAGIRVDYTFTPTDNISVKPYFFVNYVDVSNANVQTTVNLTVLQQPFGRYWQKEV 
1300 1310 1320 1330 1340 1350 



orf lng-1 .pep 
p4538T 



Based on this analysis, it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 78 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 655>: 



1 


. . AAGGTGTGGC 


AATTTGTCGA 


AGA.CCGCTG 


CGTGCCGTCG 


TGCCTGCCGA 


51 


CAGTTTTGAA 


CCGACCGCGC 


AAAAATTGAA 


CCTGTTTAAG 


GCGGGTGCGG 


101 


CAACCATTTT 


GTTTTATGAA 


GATCAAAATG 


TCGTCAAAGG 


TTTGCAGGAG 


151 


CAGTTCCCTG CTTATGCCGC TAACTTCCCC GTTTGGGCGg 


ATCAGGCAAA 


201 


CGCGATGGTG 


CAGTATGCCG 


TTTGGACGAC 


ACTTGCCGCG 


GTCGGCGTAG 


251 


GTGCAAACCT 


GCAACATTAC 


AATCCCTTGC 


CCGATGCGGC 


GATTGCCAAA 


301 


GCGTGGAATA 


TCCCCGAAAA 


CTGGTTGTTG 


CGCGCACAAA 


TGGTTATCGG 


351 


CGGTATTGAA 


GGGGCGGCAG 


GTGAAAAGAC 


CTTTGAACCC 


GTTGCAGAAC 


401 


GTTTGAAAGT 


GTTCGGCGCA 


TAA 







This corresponds to the amino acid sequence <SEQ ID 656; ORF6>: 

1 . . KVWQFVEXPL RAWPADSFE PTAQKLNLFK AGAATILFYE DQNWKGLQE 
51 QFPAYAANFP VWADQANAMV QYAVWTTLAA VGVGANLQHY NPLPDAAIAK 

101 AWNIPENWLL RAQMVIGGIE GAAGEKTFEP VAERLKVFGA * 

Further sequence analysis revealed a further partial DNA sequence <SEQ ID 657>: 

1 . . CTGCGTGCCG TCGTGCCTGC CGACAGTTTT GAACCGACCG CGCAAAAATT 

51 GAACCTGTTT AAGGCGGGTG CGGCAACCAT TTTGTTTTAT GAAGATCAAA 

101 ATGTCGTCAA AGGTTTGCAG GAGCAGTTCC CTGCTTATGC CGCTAACTTC 

151 CCCGTTTGGG CGGATCAGGC AAACGCGATG GTGCAGTATG CCGTTTGGAC 

201 GACACTTGCC GCGGTCGGCG TAGGTGCAAA CCTGCAACAT TACAATCCCT 

251 TGCCCGATGC GGCGATTGCC AAAGCGTGGA ATATCCCCGA AAACTGGTTG 

301 TTGCGCGCAC AAATGGTTAT CGGCGGTATT GAAGGGGCGG CAGGTGAAAA 

351 GACCTTTGAA CCCGTTGCAG AACGTTTGAA AGTGTTCGGC GCATAA 

This corresponds to the amino acid sequence <SEQ ID 658; ORF6-l>: 

1 . . LRAVVPADSF EPTAQKLNLF KAGAATILFY EDQNWKGLQ EQFPAYAANF 
51 PVWADQANAM VQYAVWTTLA AVGVGANLQH YNPLPDAAIA KAWNIPENWL 
101 LRAQMVIGGI EGAAGEKTFE PVAERLKVFG A* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.menineitidis (strain A) 

ORF6 shows 98.6% identity over a 140aa overlap with an ORF (ORF6a) from strain A of N. 
meningitidis: 

10 20 30 
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KVWQFVEXPLRAVVPADSFEPTAQKLNLFK 

I I I I I I I II I I I I I Ml 

QIVEHAVLHTPSSFNSQSARWVLF3EEHDKVWQFVEDALRAWPADSFEPTAQKLNLFK 



AGAATILFYEDQNVVKGLQEQFPAYAANFPVWADQANAWQYAVWTTLAAVGVGANLQHY 

I I I I I I IN 

AGAATILFYEDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHY 
100 110 120 130 140 150 

100 110 120 130 140 

NPLPDAAIAKAWNIPENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGAX 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

NPLPDAAIAKAWNIPENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGAX 
160 170 180 190 200 



The complete length ORF6a nucleotide sequence <SEQ ID 659> is: 



401 
451 
501 



ATGACCCGTC 
TTCGTTAAAT 
TCGAACACGC 
CGTGTGGTCG 
CGAAGACGCG 
CGCAAAAATT 
GAAGATCAAA 
CGCCAACTTT 
CCGTTTGGAC 
TACAATCCCT 
AAACTGGTTG 
CAGGTGAAAA 
GCATAA 



AATCTCTGCA 
AAAAATCTGC 
CGTTTTGCAC 
TGCTGTTTGG 
CTGCGTGCCG 
GAACCTGTTT 
ATGTCGTCAA 
CCCGTTTGGG 
GACACTTGCC 
TGCCCGATGC 
TTGCGCGCAC 
GACCTTTGAA 



ACAGGCTGCC 
CCGTCGGCAA 
ACACCTTCTT 
CGAAGAGCAT 
TCGTGCCTGC 
AAGGCGGGTG 
AGGTTTGCAG 
CGGACCAGGC 
GCGGTCGGCG 
GGCGATTGCC 
AAATGGTTAT 
CCAGTTGCAG 



GAAAGCCGCC 
AGATGAAATC 
CGTTCAATTC 
GATAAGGTGT 
CGACAGTTTT 
CGGCAACTAT 
GAGCAGTTCC 
GAACGCGATG 
TAGGTGCAAA 
AAAGCGTGGA 
CGGCGGTATT 
AACGTTTGAA 



GTTCCATTTA 
GTCCAAATCG 
CCAATCTGCC 
GGCAATTTGT 
GAACCGACCG 
TTTGTTTTAT 
CTGCTTATGC 
GTGCAGTATG 
CCTGCAACAT 
ATATCCCCGA 
GAAGGGGCGG 
AGTGTTCGGC 



This is predicted to encode a protein having amino acid sequence <SEQ ID 660>: 

1 MTRQSLQQAA ESRRSIYSLN KNLPVGKDEI VQIVEHAVLH TPSSFNSQSA 

51 RWVLFGEEH DKVWQFVEDA LRAWPADSF EPTAQKLNLF KAGAATILFY 

101 EDQNWKGLQ EQFPAYAANF PVWADQANAM VQYAVWTTLA AVGVGANLQH 

151 YNPLPDAAIA KAWNIPENWL LRAQMVIGGI EGAAGEKTFE PVAERLKVFG 



ORF6a and ORF6-1 show 100.0% identity in 131 aa overlap: 



TPSSFNSQSARVWLFGEEHDKWQFVEDALRAWPADSFEPTAQKLNLFKAGAATILFY 

I I I I I I I I I I I I I I I 

LRAWPADS FE PTAQKLNLFKAGAAT ILFY 



110 120 130 140 150 160 

EDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHYNPLPDAAIA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 1 I I I I I 
EDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHYNPLPDAAIA 



Homology with a predicted ORF from N. gonorrhoeae 

ORF6 shows 95.7% identity over a 140aa overlap with a predicted ORF (ORF6ng) from 



*N .gonorrhoeae: 
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orf6 pep KVWQFVEXPLRAVVPADSFE PTAQKLNLFK 

I I I I I I I II I : I I I 

orf 6ng SNVSLDMSNPTVLRMGLPLYIASLRRGAIYKVWQFVEDALRAWPADSFEPTAQKLKLFK 

orf 6 . pep AGAATILFYEDQNVVKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHY 

or f 6ng AGAATILFYEDQNVVKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGAGANLQHY 

orf6.pep NPLPDAAIAKAWNIPENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGA 140 

orf6ng NPLPDVAIAKAWNIPENWLLRAQMVIGGIEGAAGEKVFEPVAERLKVFGA 174 

The complete length ORF6ng nucleotide sequence <SEQ ID 661> was identified as: 



ATGGCCGTTG 
ACGCATGGGA 
ATAAGGTGTG 
GACAGTTTTG 
GGCAACCATT 
AGCAGTTCCC 
AACGCTATGG 
AGGTGCAAAT 
AAGCGTGGAA 
GGTGGTATTG 
acgtttgAAA 



CGTCAAATGT 
TTACCCTTAT 
GCAATTTGTC 
AACCGACCGC 
TTGTTTTATG 
TGCTTATGCC 
TACAGTATGC 
CTGCAACATT 
TATTCCCGAA 
AAGGGGcggc 
GTGTTCGGCG 



CAGCTTGGAT 
ATATTGCGTC 
GAAGACGCGC 
GCAAAAATTG 
AAGATCAAAA 
GCCAACTTIC 
CGTCTGGACG 
ACAACCCCTT 
AACTGGCTGT 
aggtgaaaaa 
CATAA 



ATGTCCAATC 
CCTAAGAAGG 
TGCGTGCCGT 
AAGCTGTTTA 
TGTCGTCAAA 
CCGTTTGGGC 
ACACTTGCCG 
GCCCGATGTG 
TGCGCGCGCA 
gtctttgaac 



CTACGGTGTT 
GGCGCAATAT 
CGTGCCTGCC 
AGGCGGGCGC 
GGTTTGCAGG 
GGACCAGGCG 
CGGTCGGTGC 
GCGATTGCTA 
AATGGTTATC 
CCGTTGCgga 



25 This encodes a protein having amino acid sequence <SEQ ID 662>: 

1 MAVASNVSLD MSNPTVLRMG LPLYIASLRR GAIYKVWQFV EDALRAVVPA 
51 DSFEPTAQKL KLFKAGAATI LFYEDQNWK GLQEQFPAYA ANFPVWADQA 
101 NAMVQYAVWT TLAAVGAGAN LQHYNPLPDV AIAKAWNIPE NWLLRAQMVI 
151 GGIEGAAGEK VFE PVAERLK VFGA* 

30 

ORF6ng and ORF6-1 show 96.9% identity in 131 aa overlap: 

10 20 30 

orf6-l pep LRAWPADSFEPTAQKLNLFKAGAATILFY 

I : I I I I I I I I I I I I 

35 orf6ng PTVLRMGLPLYIASLRRGAIYKWQFVEDALRAWPADSFEPTAQKLKLFKAGAATILFY 



EDQNVVKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHYNPLPDAAIA 

I I I I I I I I I II I I . I I I I I I ! I I I I I I : I I I I I I I I I I I I : I I 

EDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGAGANLQHYNPLPDVAIA 



90 



100 



110 



100 110 120 130 

KAWNIPENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGAX 

'I'll II III II ,11111 I: MINIMI 

KAWN I PENWLLRAQMV IGGIEGAAGEKVFE PVAERLKVFGAX 
140 150 160 170 



50 It is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 79 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 663> 

1 . . GGCTACAACT ACCTGTTCGC GCGCGGCAGC CGCATCGCCA ACTACCAAAT 

51 CAACGGCATC CCCGTTGCCG ACGCGCTGGC CGATACGGGi CAATGCCAAC 

101 ACCGCCGCCT ATGAGCGCGT AGAAGTCGTG CGCGGCGTGG CGGGGCTGCT 

151 GGACGGCACG GGCGAGCCTT CCGCCACCGT CAATCTGGTG CGCAAACGCC 

201 TGACCCGCAA GCCATTGTTT GAAGTCCGCG CCGAAGCgGG CAACCGcAAA 
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251 CATTTCGGGC TGGACGCGGA CGTATCGGGC AGCCTGAACA CCGAAG.crC 
301 rCTGCGCgGC CGCCTGGTTT CCAcCTTCGG ACGCGGCGAC TCGTGGCGGC 
351 GGCGCGAACG CAGCCGskAT GCCGAACTCT ACGGCATTTT GGAATACGAC 
401 ATCGCACCGC AAACCCGCGT CCACGCArGC ATGGACTACC AGCAGGCGAA 
4 51 AGAAACCGCC GACGCGCCGC TCAGcTACGC CGTGTACGAC AGCCAAGGTT 
501 ATGCCACCGC CTTCGGCCCG AAAGACAACC CCGCCACAAA TTGGGCGAAC 
551 AGCCACCACC GTGCGCTCAA CCTGTTCGCC GGCATCGAAC ACCGCTTCAA 
601 CCAAGACTGG AAACTCAAAG CCGAATACGA CTAC. . 

This corresponds to the amino acid sequence <SEQ ID 664; ORF23>: 

1 . . GYNYLFARGS RIANYQINGI PVADALADTG NANTAAYERV EWRGVAGLL 
51 DGTGEPSATV NLVRKRLTRK PLFEVRAEAG NRKHFGLDAD VSGSLNTEXX 
101 LRGRLVSTFG RGDSWRRRER SRXAELYGIL EYDIAPQTRV HAXMDYQQAK 
151 ETADAPLSYA VYDSQGYATA FGPKDNPATN WANSHHRALN LFAGIEHRFN 
201 QDWKLKAEYD Y. . 

Further work revealed the complete nucleotide sequence <SEQ ID 665>: 

1 ATGACACGCT TCAAATATTC CCTGCTGTTT GCCGCCCTGT TGCCCGTGTA 

51 CGCGCAGGCC GATGTTTCTG TTTCAGACGA CCCCAAACCG CAGGAAAGCA 

101 CTGAATTGCC GACCATCACC GTTACCGCCG ACCGCACCGC GAGTTCCAAC 

151 GACGGCTACA CTGTTTCCGG CACGCACACC CCGCTCGGGC TGCCCATGAC 

201 CCTGCGCGAA ATCCCGCAGA GCGTCAGCGT CATCACATCG CAACAAATGC 

2 51 GCGACCAAAA CAT CAAAACG CTCGACCGCG CCCTGTTGCA GGCGACCGGC 

301 ACCAGCCGCC AGATTTACGG CTCCGACCGC GCGGGCTACA ACTACCTGTT 

351 CGCGCGCGGC AGCCGCATCG CCAACTACCA AATCAACGGC ATCCCCGTTG 

4 01 CCGACGCGCT GGCCGATACG GGCAATGCCA ACACCGCCGC CTATGAGCGC 

451 GTAGAAGTCG TGCGCGGCGT GGCGGGGCTG CTGGACGGCA CGGGCGAGCC 

501 TTCCGCCACC GTCAATCTGG TGCGCAAACG CCTGACCCGC AAGCCATTGT 

551 TTGAAGTCCG CGCCGAAGCG GGCAACCGCA AACATTTCGG GCTGGACGCG 

601 GACGTATCGG GCAGCCTGAA CACCGAAGGC ACGCTGCGCG GCCGCCTGGT 

651 TTCCACCTTC GGACGCGGCG ACTCGTGGCG GCGGCGCGAA CGCAGCCGCG 

701 ATGCCGAACT CTACGGCATT TTGGAATACG ACATCGCACC GCAAACCCGC 

751 GTCCACGCAG GCATGGACTA CCAGCAGGCG AAAGAAACCG CCGACGCGCC 

801 GCTCAGCTAC GCCGTGTACG ACAGCCAAGG TTATGCCACC GCCTTCGGCC 

851 CGAAAGACAA CCCCGCCACA AATTGGGCGA ACAGCCGCCA CCGTGCGCTC 

901 AACCTGTTCG CCGGCATCGA ACACCGCTTC AACCAAGACT GGAAACT CAA 

951 AGCCGAATAC GACTACACCC GCAGCCGCTT CCGCCAGCCC TACGGCGTAG 

1001 CAGGCGTGCT TTCCATCGAC CACAACACCG CCGCCACCGA CCTGATTCCC 

1051 GGTTATTGGC ACGCCGACCC GCGCACCCAC AGCGCCAGCG TGTCATTGAT 

1101 CGGCAAATAC CGCCTGTTCG GCCGCGAACA CGATTTAATC GCGGGTATCA 

1151 ACGGTTACAA ATACGCCAGC AACAAATACG GCGAACGCAG CATCATCCCC 

1201 AACGCCATTC CCAACGCCTA CGAATTTTCC CGCACGGGTG CCTACCCGCA 

12 51 GCCTGCATCG TTTGCCCAAA CCATCCCGCA ATACGGCACC AGGCGGCAAA 

1301 TCGGCGGCTA TCTCGCCACC CGTTTCCGCG CCGCCGACAA CCTTTCGCTG 

1351 ATTTTGGGCG GACGATACAC CCGTTACCGC ACCGGCAGCT ACGACAGCCG 

14 01 CACACAAGGC ATGACCTATG TGTCCGCCAA CCGTTTCACC CCCTACACAG 

1451 GCATCGTGTT CGACCTGACC GGCAACCTGT CTCTTTACGG CTCGTACAGC 

1501 AGCCTGTTCG TCCCGCAATC GCAAAAAGAC GAACACGGCA GCTACCTGAA 

1551 ACCCGTAACC GGCAACAATC TGGAAGCCGG CATCAAAGGC GAATGGCTTG 

1601 AAGGCCGTCT GAACGCATCC GCCGCCGTGT ACCGCGCCCG TAAAAACAAC 

1651 CTCGCCACCG CAGCAGGACG CGACCCGAGC GGCAACACCT ACTACCGCGC 

1701 CGCCAACCAA GCCAAA^CCC ACGGCTGGGA AATCGAAGTC GGCGGCCGCA 

1751 TCACGCCCGA ATGGCAGATA CAGGCAGGTT ACAGCCAAAG CAAAACCCGC 

1801 GACCAAGACG GCAGCCGCCT GAACCCCGAC AGCGTACCCG AACGCAGCTT 

1851 CAAACTCTTC ACTGCCTACC ACTTTGCCCC CGAAGCCCCC AGCGGCTGGA 

1901 CCATCGGCGC AGGCGTGCGC TGGCAGAGCG AAACCCACAC CGACCCTGCC 

1951 ACGCTCCGCA TCCCCAACCC CGCCGCCAAA GCCCGCGCCG CCGACAACAG 

2001 CCGCCAAAAA GCCTACGCCG TCGCCGACAT CATGGCGCGT TACCGCTTCA 

2051 ATCCGCGCGC CGAACTGTCG CTGAACGTGG ACAATCTGTT CAACAAACAC 

2101 TACCGCACCC AGCCCGACCG CCACAGCTAC GGCGCACTGC GGACAGTGAA 

2151 CGCGGCGTTT ACCTATCGGT TTAAATAA 

This corresponds to the amino acid sequence <SEQ ID 666; ORF23-l>: 

1 MTRFKYSLLF AALLPVYAQA DVSVSDDPKP QESTELPTIT VTADRTASSN 

51 DGYTVSGTHT PLGLPMTLRE IPQSVSVITS QQMRDQNIKT LDRALLQATG 

101 TSRQIYGSDR AGYNYLFARG SRIANYQING I PVADALADT GNANTAAYER 

151 VEWRGVAGL LDGTGEPSAT VNLVRKRLTR KPLFEVRAEA GNRKHFGLDA 

201 DVSGSLNTEG TLRGRLVSTF GRGDSWRRRE RSRDAELYGI LEYDIAPQTR 
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VHAGMDYQQA 
NLFAGIEHRF 
GYWHADPRTH 
NAIPNAYEFS 
ILGGRYTRYR 
SLFVPQSQKD 
LATAAGRDPS 
DQDGSRLNPD 
TLRIPNPAAK 
YRTQPDRHSY 



KETADAPLSY 
NQDWKLKAEY 
SASVSLIGKY 
RTGAYPQPAS 
TGSYDSRTQG 
EHGSYLKPVT 
GNTYYRAANQ 
3VPERSFKLF 
ARAADNSRQK 
GALRTVNAAF 



AVYDSQGYAT 
DYTRSRFRQP 
RLFGREHDLI 
FAQTIPQYGT 
MTYVSANRFT 
GNNLEAGIKG 
AKTHGWEIEV 
TAYHFAPEAP 
AYAVADIMAR 
TYRFK* 



AFGPKDNPAT 
YGVAGVLSID 
AGINGYKYAS 
RRQIGGYLAT 
PYTGIVFDLT 
EWLEGRLNAS 
GGRITPEWQI 
SGWTIGAGVR 
YRFNPRAELS 



NWANSRHRAL 
HNTAATDLIP 
NKYGERSIIP 
RFRAADNLSL 
GNLSLYGSYS 
AAVYRARKNN 
QAGYSQSKTR 
WQSETHTDPA 
LNVDNLFNKH 



Computer analysis of this amino acid sequence gave the following results: 

Homology with the ferric-pseudobactin receptor PupB of Pseudomonas putida (accession number P38047) 

ORF23 and PupB protein show 32% aa identity in 205 aa overlap: 

Orf23 6 FARGSRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLLDGTGEPSATVNLVRK 65 

++RG I NY+++G+P + L D + + A ++RVE+VRG GL+ G G PSAT+NL+RK 
PupB 215 WSRGFAIQNYEVDGVPTSTRL-DNYSQSMAMFDRVEIVRGATGLISGMGNPSATINLIRK 273 

Orf23 66 RLTRKPLFEVRAEAGNRKHFGLDADVSGSLNTEXXLRGRLVSTFXXXXXXXXXXXXXXAE 125 

R T + + EAGN +G DVSG L +RGR V+ + 

PupB 274 RPTAEAQASITGEAGNWDRYGTGFDVSGPLTETGNIRGRFVADYKTEKAWIDRYNQQSQL 333 

Orf23 126 LYGILEYDIAPQTRVHAXMDYQQAKETADAPLSYAVYD— SQGYATAFGPKDNPATNWAN 183 

+YGI E+D++ T + Y + D+PL + S G T N A +W+ 

PupB 334 MYGITEFDLSEDTLLTVGFSY— LRSDIDSPLRSGLPTRFSTGERTNLKRSLNAAPDWSY 391 

Orf23 184 SHHRALNLFAGIEHRFNQDWKLKAE 208 

+ H +FIE+ WKE 
PupB 392 NDHEQTSFFTSIEQQLGNGWSGKIE 416 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF23 shows 95.7% identity over a 21 laa overlap with an ORF (ORF23a) from strain A of N. 

meningitidis: 

10 20 30 

orf23 .pep GYNYLFARGSRIANYQINGIPVADALADTG 

II I I I I I I 

orf23a QMRDQNIKALDRALLQATGTSRQIYGSDRAGYNYLFARGSRIANYQINGIPVADALADTG 
90 100 110 120 130 140 

40 50 60 70 80 90 

orf 23 . pep NANTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRLTRKPLFEVRAEAGNRKHFGLDAD 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 23a NANTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRPTRKPLFEVRAEAGNRKHFGLGAD 



ETADAPLSYAVYDSQGYATAF3PKDNPATNWANSHHRALNLFAGIEHRFNQDWKLKAEYD 



orf 23. pep 
orf23a 



The complete length ORF23a nucleotide sequence <SEQ ID 667> is: 
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1 ATGACACGCT TCAAATATTC CCTGCTGTTT GCCGCCCTGT TGCCCGTGTA 

51 CGCGCAGGCC GATGTTTCTG TTTCAGACGA CCCAAAACCG CAGGAAAGCA 

101 CTGAATTGCC GACCATCACC GTTACCGCCG ACCGCACCGC GAGTTCCAAC 

151 GACGGCTACA CTGTTTCCGG CACGCACACC CCGCTCGGGC TGCCCATGAC 

201 CCTGCGCGAA ATCCCGCAGA GCGTCAGCGT CATCACATCG CAACAAATGC 

251 GCGACCAAAA CATCAAAGCG CTCGACCGCG CCCTGTTGCA GGCGACCGGC 

301 ACCAGCCGCC AGATTTACGG CTCCGACCGC GCGGGCTACA ACTACCTGTT 

351 CGCGCGCGGC AGCCGCATCG CCAACTACCA AATCAACGGC ATCCCCGTTG 

401 CCGACGCGCT GGCCGATACG GGCAATGCCA ACACCGCCGC CTATGAGCGC 

4 51 GTAGAAGTCG TGCGCGGCGT GGCGGGGCTG CTGGACGGCA CGGGCGAGCC 

501 TTCCGCCACC GTCAATCTGG TGCGCAAACG CCCGACCCGC AAGCCATTGT 

551 TTGAAGTCCG CGCCGAAGCG GGCAACCGCA AACATTTCGG GCTGGGCGCG 

601 GACGTATCGG GCAGCCTGAA TGCCGAAGGC ACGCTGCGCG GCCGCCTGGT 

651 TTCCACCTTC GGACGCGGCG ACTCGTGGCG GCAGCGCGAA CGCAGCCGCG 

701 ATGCCGAACT CTACGGCATT TTGGAATACG ACATCGCACC GCAAACCCGC 

7 51 GTCCACGCAG GCATGGACTA CCAGCAGGCG AAAGAAACCG CCGACGCGCC 

801 GCTCAGCTAC GCCGTGTACG ACAGCCAAGG TTATGCCACC GCCTTCGGCC 

851 CGAAAGACAA CCCCGCCACA AATTGGGCGA ACAGCCGCCA CCGTGCGCTC 

901 AACCTGTTCG CCGGCATCGA ACACCGCTTC AACCAAGACT GGAAACTCAA 

951 AGCCGAATAC GACTACACCC GCAGCCGCTT CCGCCAGCCC TACGGCGTAG 

1001 CAGGCGTGCT TTCCATCGAC CACAACACCG CCGCCACCGA CCTGATTCCC 

1051 GGTTATTGGC ACGCCGACCC GCGCACCCAC AGCGCCAGCG TGTCATTAAT 

1101 CGGCAAATAC CGCCTGTTCG GCCGCGAACA CGATTTAATC GCGGGTATCA 

1151 ACGGTTACAA ATACGCCAGC AACAAAT AC G GCGAACGCAG CATCATCCCC 

1201 AACGCCATTC CCAACGCCTA CGAATTTTCC CGCACGGGTG CCTACCCGCA 

1251 GCCTGCATCG TTTGCCCAAA CCATCCCGCA ATACGGCACC AGGCGGCAAA 

1301 TCGGCGGCTA TCTCGCCACC CGTTTCCGCG CCGCCGACAA CCTTTCGCTG 

13 51 ATACTCGGCG GCAGATACAG CCGTTACCGC ACCGGCAGCT ACGACAGCCG 

1401 CACACAAGGC ATGACCTATG TGTCCGCCAA CCGTTTCACC CCCTACACAG 

1451 GCATCGTGTT CGACCTGACC GGCAACCTGT CGCTTTACGG CTCGTACAGC 

1501 AGCCTGTTCG TCCCGCAATC GCAAAAAGAC GAACACGGCA GCTACCTGAA 

1551 ACCCGTAACC GGCAACAATC TGGAAGCCGG CATCAAAGGC GAATGGCTTG 

1601 AAGGCCGTCT GAACGCATCC GCCGCCGTGT ACCGCGCCCG TAAAAACAAC 

1651 CTCGCCACCG CAGCAGGACG CGACCCGAGC GGCAACACCT ACTACCGCGC 

1701 CGCCAACCAA GCCAAAACCC ACGGCTGGGA AATCGAAGTC GGCGGCCGCA 

17 51 TCACGCCCGA ATGGCAGATA CAGGCAGGTT ACAGCCAAAG CAAAACCCGC 

18 01 GACCAAGACG GCAGCCGCCT GAACCCCGAC AGCGTACCCG AACGCAGCTT 
1851 CAAACTCTTC ACTGCCTACC ACTTTGCCCC CGAAGCCCCC AGCGGCTGGA 
1901 CCATCGGCGC AGGCGTGCGC TGGCAGAGCG AAACCCACAC CGACCCTGCC 
1951 ACGCTCCGCA TCCCCAACCC CGCCGCCAAA GCCCGCGCCG CCGACAACAG 
2001 CCGCCAAAAA GCCTACGCCG TCGCCGACAT CATGGCGCGT TACCGCTTCA 
2051 ATCCGCGCGC CGAACTGTCG CTGAACGTGG ACAATCTGTT CAACAAACAC 
2101 TACCGCACCC AGCCCGACCG CCACAGCTAC GGCGCACTGC GGACAGTGAA 
2151 CGCGGCGTTT ACCTATCGGT TTAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 668>: 

1 MTRFKYSLLF AALLPVYAQA DVSVSDDPKP QESTELPTIT VTADRTASSN 

51 DGYTVSGTHT PLGLPMTLRE IPQSVSVITS QQMRDQNIKA LDRALLQATG 

101 TSRQIYGSDR AGYNYLFARG SRIANYQING I PVADALADT GNANTAAYER 

151 VEWRGVAGL LDGTGEPSAT VNLVRKRPTR KPLFEVRAEA GNRKHFGLGA 

201 DVSGSLNAEG TLRGRLVSTF GRGDSWRQRE RSRDAELYGI LEYDIAPQTR 

251 VHAGMDYQQA KETADAPLSY AVYDSQGYAT AFGPKDNPAT NWANSRHRAL 

301 NLFAGIEHRF NQDWKLKAEY DYTRSRFRQP YGVAGVLSID HNTAATDLIP 

351 GYWHADPRTH SASVSLIGKY RLFGREHDLI AGINGYKYAS NKYGERSIIP 

401 NAIPNAYEFS RTGAYPQPAS FAQTIPQYGT RRQIGGYLAT RFRAADNLSL 

451 ILGGRYSRYR TGSYDSRTQG MTYVSANRFT PYTGIVFDLT GNLSLYGSYS 

501 SLFVPQSQKD EHGSYLKPVT GNNLEAGIKG EWLEGRLNAS AAVYRARKNN 

551 LATAAGRDPS GNTYYRAANQ AKTHGWEIEV GGRITPEWQI QAGYSQSKTR 

601 DQDGSRLNPD SVPERSFKLF TAYHFAPEAP SGWTIGAGVR WQSETHTDPA 

651 TLRIPNPAAK ARAADNSRQK AYAVAD IMAR YRFNPRAELS LNVDNLFNKH 

701 YRTQPDRHSY GALRTVNAAF TYRFK* 

ORF23a and ORF23-1 show 99.2% identity in 725 aa overlap: 

10 20 30 40 50 60 

orf2 3a.pep MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSNDGYTVSGTHT 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf23-l MTRFKYSLLFAALLPVYAQADVSVSDOPKPQESTELPTITVTADRTASSNDGYTVSGTHT 
10 20 30 40 50 60 
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orf23a.pep 
orf23-l 



orf23a.pep 
orf23-l 



orf23a.pep 
orf23-l 



orf23a.pep 
orf23-l 



orf 23a .pep 
orf23-l 



orf23a.pep 
orf23-l 



orf 23a .pep 
orf23-l 



orf 23a . pep 
orf23-l 



orf23a.pep 
orf23-l 



PLGLPMTLREIPQSVSVITSQQMRDQNIKALDRALLQATGTSRQIYGSDRAGYNYLFARG 

I I I I I I I I I I I I 1 : I I I I I I I I I I I I I I I I I I I I I I 

PLGLPMTLREIPQSVSVITSQQMRDQNIKTLDRALLQATGTSRQIYGSDRAGYNYLFARG 
70 80 90 100 110 120 

130 140 150 160 170 180 

SRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRPTR 

I I I I I I I I I I I I Ill II II I I I 

SRIANYQINGIPVAOALADTGNANTAAYERVEWRGVAGLLDGTGEPSATWLVRKRLTR 

130 140 150 160 170 180 

190 200 210 220 230 240 

KPLFEVRAEAGNRKHFGLGADVSGSLNAEGTLRGRLVSTFGRGDSWRQRERSRDAELYGI 

I I I I I I I I I I II: I 1:11 

KPLFEVRAEAGNRKHFGLDADVSGSLNTEGTLRGRLVSTFGRGDSWRRRERSRDAELYGI 

190 200 210 220 230 240 

250 260 270 280 290 300 

LEYDIAPQTRVHAGMDYQQAK3TADAPLSYAVYDSQGYATAFGPKDNPATNWANSRHRAL 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

LEYDIAPQTRVHAGMDYQQAKSTADAPLSYAVYDSQGYATAFGPKDNPATNWANSRHRAL 

250 260 270 280 290 300 

310 320 330 340 350 360 

NLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLSIDHNTAATDLIPGYWHADPRTH 

I I I I I I I I I I I I I I I I I I I I 

NLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLSIDHNTAATDLIPGYWHADPRTH 

310 320 330 340 350 360 

370 380 390 400 410 420 

SASVSLIGKYRLFGREHDLIAGINGYKYASNKYGERS I IPNAIPNAYEFSRTGAYPQPAS 

I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

SASVSLIGKYRLFGREHDL I AGING YKYASNKYGERS I IPNAIPNAYEFSRTGAYPQPAS 

370 380 390 400 410 420 

430 440 450 460 470 480 

FAQTIPQYGTRRQIGGYLATRFRAADNLSLILGGRYSRYRTGSYDSRTQGMTYVSANRFT 
I I I I I I I I I I I II II I I I I I I I I I I I I I I I I II I I I : I I I I I I I I I I I I I I I I I I I I I I I 
FAQTIPQYGTRRQIGGYLATRFRAADNLSLILGGRYTRYRTGSYDSRTQGMTYVSANRFT 

430 440 450 460 470 480 

490 500 510 520 530 540 

PYTGIVFDLTGNLSLYGSYSSLFVPQSQKDEHGSYLKPVTGNNLEAGIKGEWLEGRLNAS 

I I I I I I I I I I I I I I I I I I I I I I I I I I I Ill I I I I I I I I 

PYTGIVFDLTGNLSLYGSYSSLFVPQSQKDEHGSYLKPVTGNNLEAGIKGEWLEGRLNAS 

490 500 510 520 530 540 



550 



560 



570 



580 



590 



600 



AAVYRARKNNLATAAGRDPSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKTR 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I 
AAVYRARKNNLATAAGRDFSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKTR 



orf 23a. pep 
orf23-l 



DQDGSRLNPDSVPERSFKLFTAYHFAP 



I GAGVRWQSETHT D PATLRI PN PAAK 



670 680 690 700 710 720 

orf 23a . pep ARAADNSRQKAYAVADIMARYRFNPRAELSLNVDNLFNKHYRTQPDRHSYGALRTVNAAF 

I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I Ill 

orf 23-1 ARAADNSRQKAYAVADIMARYRFNPRAELSLNVDNLFNKHYRTQPDRHSYGALRTVNAAF 

670 680 690 700 710 720 



orf 23a. pep 
orf23-l 
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Homologv with a predicted ORF from N. gonorrhoeae 

ORF23 shows 93.4% identity over a 21 laa overlap with a predicted ORF (ORF23.ng) from N. 
gonorrhoeae: 



orf23.pep 
orf 23ng 


GYNYLFARGSRIANYQINGIPVADALADTGNANTAAYERVEVVRGVAGLLD 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 I 1 1 1 1 1 1 

SAVDACRIPGYNYLFARGSRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLPD 


51 
60 


orf23.pep 


GTGEPSATTOLVRKRLTRKPLFEVRAEAGNRKHFGLDADVSGSLNTEXXLRGRLVSTFGR 


111 


orf23ng 


GTGEPSATVNLVRKHPTRKPLFEVRAEAGNRKHFGLGADVSGSLNAEGTLRGRLVSTFGR 


120 


orf 23 .pep 


GDSWRRRERSRXAELYGILEYDIAPQTRVHAXMDYQQAKETADAPLSYAVYDSQGYATAF 


171 


orf23ng 


GDSWRQLERSRDAELYGILEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAF 


180 


orf 23 .pep 


GPKDNPATNWANSHHRALNLFAGIEHRFNQDWKLKAEYDY 


211 


orf23ng 


GPKDNPATNWSNSRNRALNLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLSIDHS 


240 



The ORF23ng nucleotide sequence <SEQ ID 669> is predicted to encode a protein comprising 
amino acid sequence <SEQ ID 670>: 



1 SAVDACRIPG YNYLFARGSR 

51 WRGVAGLPD GTGEPSATVN 

101 SGSLNAEGTL RGRLVSTFGR 

151 AGMDYQQAKE TADAPLSYAV 

201 FAGIEHRFNQ DWKLKAEYDY 

251 WHADPRTHSA SMSLTGKYRL 

301 IPNAYEFSRT GAYPQPSSFA 

351 GGRYSRYRAG SYNSRTQGMT 

401 FVPQLQKDEH GSYLKPVTGN 

451 TAAGRDQSGN TYYRAANQAK 

501 DGSRLNPDSV PERSFKL FT A 

551 RIPNPAAKAR AVANSRQKAY 

601 TQPDRHSYGA LRTVNAAFTY 

Further work revealed the complete nuck 



IANYQINGIP VADALADTGN ANTAAYERVE 
LVRKHPTRKP LFEVRAEAGN RKHFGLGADV 
GDSWRQLERS RDAELYGILE YDIAPQTRVH 
YDSQGYATAF GPKDNPATNW SNSRNRALNL 
TRSRFRQPYG VAGVLSIDHS TAATDLIPGY 
FGREHDLIAG INGYKYASNK YGERSIIPNA 
QT1PQYDTRR QIGGYLATRF RAADNLSLIL 
YVSANRFTPY TGIVFDLTGN LSLYGSYSSL 
NLEADIKGEW LEGRLNASAA VYRARKNNLA 
THGWEIEVGG RITPEWQIQA GYSQSKPRDQ 
YHLAPEAPSG RTIGAGVRRQ GETHTDPAAL 
AVADIMARYR FNPRTELSLN VDNLFNKHYR 
RFK* 

:otide sequence <SEQ ID 67 1>: 



1 


ATGACACGCT 


TCAAATACTC 


51 


CGCGCAGGCC 


GATGTTTCTG 


101 


CCGAATTGCC 


GACCATCACC 


151 


GACGGCTACA 


CCGTTTCCGG 


201 


CCTGCGCGAA 


ATCCCGCAGA 


251 


GCGACCAAAA 


CATCAAAACG 


301 


ACCAGCCGCC 


AGATTTACGG 


351 


CGCGCGCGGC 


AGCCGCATCG 


401 


CCGACGCGCT 


GGCCGATACG 


451 


GTAGAAGTCG 


TGCGCGGCGT 


501 


TTCTGCCACC 


GTCAATCTGG 


551 


TTGAAGTCCG 


CGCCGAAGCC 


601 


GACGTATCGG 


GCAGCCTGAA 


651 


TTCCACCTTC 


GGACGCGGCG 


701 


ATGCCGAACT 


CTACGGCATT 


751 


GTCCACGCAG 


GCATGGACTA 


801 


GCTCAGCTAC 


GCCGTGTACG 


851 


CAAAAGACAA 


CCCCGCCACA 


901 


AACCTGTTCG 


CCGGCATAGA 


951 


AGCCGAATAC 


GACTACACCC 


1001 


CAGGCGTACT 


TTCCATCGAC 


1051 


GGTTATTGGC 


ACGCcgatcc 


1101 


CGGCAAATAC 


CgcctGTTCG 


1151 


ACGGCTACAA 


ATACGCCAGC 


1201 


AACGCCATTC 


CCAACGCCTA 


1251 


GCCATCATCG 


TTTGCCCAAA 


1301 


TCGGCGGCTA 


TCTCGCCACC 


1351 


ATACTCGGCG 


GCAGATACAG 



CCTGCTTTTT GCCGCCCTGC TACCCGTGTA 
TTTCAGACGA CCCCAAACCG CAGGAAAGCA 
GTTACCGCCG ACCGCACCGC GAGTTCCAAC 
CACGCACACC CCGTTCGGGC TGCCCATGAC 
GCGTCAGCGT CATCACATCG CAACAAATGC 
CTCGACCGCG CCCTGTTGCA GGCGACCGGC 
CTCCGACCGC GCGGGCTACA ACTACCTGTT 
CCAACTACCA AATCAACGGC ATCCCCGTTG 
GGCAATGCCA ACACCGCCGC CTATGAGCGC 
GGCGGGGCTG CCGGACGGCA CGGGCGAGCC 
TACGCAAACA CCCGACCCGC AAGCCATTGT 
GGCAACCGCA AACATTTCGG GCTGGGCGCG 
CGCCGAAGGC ACGCTGCGCG GCCGCCTGGT 
ACTCGTGGCG GCAGCTCGAA CGCAGCCGCG 
TTGGAATACG ACATCGCACC GCAAACCCGC 
CCAGCAGGCG AAAGAAACCG CAGACGCGCC 
ACAGCCAAGG TTATGCCACC GCCTTCGGCC 
AATTGGTCGA ACAGCCGCAA CCGTGCGCTC 
ACACCGCTTC AACCAAGACT GGAAACTCAA 
GTAGCCGCTT CCGCCAGCCC TACGGTGTGG 
CACAGCACTG CCGCCACCGA CCTGATTCCC 
GCGCACCCAC AGCGCCAGCA TGTCATTGAC 
GCCGCGAGCA CGATTTAATC GCGGGTATCA 
AACAAATACG GCGAACGCAG CATCATTCCC 
CGAATTTTCC CGCACGGGCG CCTATCCGCA 
CCATCCCGCA ATACGACACC AGGCGGCAAA 
CGTTTCCGCG CCGCCGACAA CCTTTCGCTG 
CCGCTACCGC GCAGGCAGCT ACAACAGCCG 
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1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 



CACACAAGGC 
GCATCGTGTT 
AGCCTGTTCG 
ACCCGTAACC 
AAGGGCGTCT 
CTCGCCACCG 
CGCCAACCAA 
TCACGCCCGA 
GACCAAGACG 
CAAACTCTTC 
CCATcggTGC 
GCGCTCCGCA 
CCGCCAGAAA 
ATCCGCGCAC 
TACCGCACCC 
CGCGGCGTTT 



ATGACCTATG 
CGATCTGACC 
TCCCGCAATT 
GGCAACAATC 
GAACGCATCC 
CAGCAGGACG 
GCCAAAACCC 
ATGGCAGATA 
GCAGCCGCCT 
ACCGCCTACC 
GGGTGTGCGC 
TCCCCAACCC 
GCCTACGCCG 
CGAACTGTCG 
AGCCCGACCG 
ACCTATCGGT 



TGTCCGCCAA 
GGCAACCTGT 
GCAAAAAGAC 
TGGAAGCCGA 
GCCGCCGTGT 
CGACCAGAGC 
ACGGCTGGGA 
CAGGCAGGCT 
GAACCCCGAC 
ACTTAGCCCC 
CGGCAGGGCG 
CGCCGCCAAA 
TCGCCGACAT 
CTGAACGTGG 
CCACAGCTAC 
TTAAATAA 



CCGTTTCACC 
CGCTTTACGG 
GAACACGGCA 
CATCAAAGGC 
ACCGCGCCCG 
GGCAACACCT 
AATCGAAGTC 
ACAGCCAAAG 
AGCGTAcCCG 
CGAAGCCCCC 
AAACCCACAC 
GCCCGCGCCG 
CATGGCGCGT 
ACAACCTGTT 
GGCGCACTGC 



CCCTACACAG 
CTCGTACAGC 
GCTACCTGAA 
GAATGGCTTG 
TAAAAACAAC 
ACTATCGCGC 
GGCGGCCGCA 
CAAACCCCGC 
AACGCAGCTT 
AGCGGCCGGA 
CGACCCAGCC 
TCGCCAACAG 
TACCGCTTCA 
CAACAAACAC 
GGACAGTGAA 



This corresponds to the amino acid sequence <SEQ ID 672; ORF23ng-l> 



MTRFKYSLLF AALLPVYAQA 



301 
351 
401 



DGYTVSGTHT 
TSRQIYGSDR 
VEVVRGVAGL 
DVSGSLNAEG 
VHAGMDYQQA 
NLFAGIEHRF 
GYWHADPRTH 
NAIPNAYEFS 
ILGGRYSRYR 
SLFVPQLQKD 
LATAAGRDQS 
DQDGSRLNPD 
ALRIPNPAAK 
YRTQPDRHSY 



PFGLPMTLRE 
AGYNYLFARG 
PDGTGEPSAT 
TLRGRLVSTF 
KETADAPLSY 
NQDWKLKAEY 
SASMSLTGKY 
RTGAYPQPSS 
AGSYNSRTQG 
EHGSYLKPVT 
GNTYYRAANQ 
SVPERSFKLF 
ARAVANSRQK 
GALRTVNAAF 



DVSVSDDPK? 
IPQSVSVITS 
SRIANYQING 
VNLVRKHPTR 
GRGDSWRQLE 
AVYDSQGYAT 
DYTRSRFRQP 
RLFGREHDLI 
FAQTIPQYDT 
MTYVSANRFT 
GNNLEADIKG 
AKTHGWEIEV 
TAYHLAPEAP 
AYAVADIMAS 
TYRFK* 



QESTELPTIT 
QQMRDQNIKT 
I PVADALADT 
KPLFEVRAEA 
RSRDAELYGI 
AFGPKDNPAT 
YGVAGVLSID 
AGINGYKYAS 
RRQIGGYLAT 
PYTGIVFDLT 
EWLEGRLNAS 
GGRITPEWQI 
SGRTIGAGVR 
YRFNPRTELS 



VTADRTASSN 
LDRALLQATG 
GNANTAAYER 
GNRKHFGLGA 
LEYDIAPQTR 
NWSNSRNRAL 
HSTAATDLIP 
NKYGERSIIP 
RFRAADNLSL 
GNLSLYGSYS 
AAVYRARKNN 
QAGYSQSKPR 
RQGETHTDPA 
LNVDNLFNKH 



ORF23ng-l and ORF23-1 show 95.9% identity in 725 aa overlap: 



MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSNDGYTVSGTHT 
MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSNDGYTVSGTHT 



PLGLPMTLREIPQSVSVITSQQMRDQNIKTLDRALLQATGTSRQIYGSDRAGYNYLFARG 
PFGLPMTLRE I PQSVSVITSQQMRDQNIKTLDRALLQATGTSRQIYGSDRAGYNYLFARG 



SRIANYQINGIPVADALADTGNAKTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRLTR 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : II 
SRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLPDGTGEPSATVNLVRKHPTR 
130 140 150 160 170 180 

190 200 210 220 230 240 

KPLFEVRAEAGNRKHFGLDADVSGSLNTEGTLRGRLVSTFGRGDSWRRRERSRDAELYGI 

I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I : I I I I I I I I I I I 

KPLFEVRAEAGNF.KHFGLGADVSGSLNAE3TLRGRLVSTFGRGDSWRQLERSRDAELYGI 

190 200 210 220 230 240 

250 260 270 280 290 300 

LEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAFGPKDNPATNWANSRHRAL 

LEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAFGPKDNPATNWSNSRNRAL 
250 260 270 280 290 300 



NLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLSIDHSTAATDLIPGYWHADPRTH 
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orf23-l .pep 
orf23ng-l 



orf 23-1 .pep 
orf23ng-l 



430 440 450 460 470 480 

FAQTIPQYGTRRQIGGYLATRFRAADNLSLILGGRYTRYRTGSYDSRTQGMTYVSANRFT 



orf 23-1 .pep 
orf23ng-l 



orf 23-1 .pep 
orf23ng-l 



orf23-l .pep 
orf23ng-l 



550 560 570 580 590 600 

AAVYRARKNNLATAAGRDPSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKTR 

I I I I I I I I I I I I I I I I I! I I I I Ill I 

AAVYRARKNNLATAAGRDQSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKPR 

550 560 570 580 590 600 



610 



620 



630 



640 



650 



660 



DQDGSRLNPDSVPERSFKLFTAYHFAPEAPSGWTIGAGVRWQSETHTDPATLRIPNPAAK 



orf23-l .pep 
orf23ng-l 



orf 23-1 .pep 
orf 23ng-l 



TYRFKX 

1 1 1 1 1 1 

TYRFKX 



In addition, ORF23ng-l shows significant homology with an OMP from E.coli: 

spl P16869 | FHUE_ECOLI OUTER-MEMBRANE RECEPTOR FOR FE (III) -COPROGEN, FE(III)- 
FERRIOXAMINE B AND FE ( 1 1 1 ) -RHODOTRULIC ACID PRECURSOR >gi 1 1651542 I gnl I PID | dl015403 
(D90745) Outer membrane protein FhuE precursor [Escherichia coli] 
>gi | 1651545 | gnl | PID I dl015405 (D90746) Outer membrane protein FhuE precursor 
[Escherichia coli] >gi 1 1787344 (AE000210) outer-membrane receptor for Fe(III)- 
coprogen, Fe (III) -ferrioxamine B and Fe ( III ) -rhodotrulic acid precursor 
[Escherichia coli] Length = 729 
Score = 332 bits (843), Expect = 3e-90 

Identities = 228/717 (31%), Positives = 350/717 (48%), Gaps - 60/717 (8%) 



LQATGTSRQIYGSDRAGYNYLFARGSRIANYQINGIP VADALADTGNANTAA 14 7 

G S+ SDRA Y ++RG +1 NY ++GIP + DAL+D A 
ENTLGISKSQADSDRALY— - YSRGFQIDNYMVDGIPTYFESRWNLGDALSDM AL 154 



+ERVEWRG GL GTG PSA +N+VRKH T - 



2 67 PLSYAVYDSQGYATAFGPKDNPATNWSNSRNRALNLFAGIEHRFNQDWKLKAEYDYTRSR 32 6 



55 




38 




Sbjct: 


43 






96 


60 


Sbjct: 


103 




Query: 


148 


65 


Sb j ct : 


155 




Query: 


207 




Sbjct: 


215 


70 


Query: 


267 
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Sbjct: 


275 


WGGLPRWNTDGSSNSYDRARSTAPDWAYNDKEINfCVFMTLKQQFADTWQATLNATHSEVE 


334 




Query: 


327 




374 


5 




F + Y A V D ++ PG+ W++ R A + G Y LFG 






Sbjct: 


335 


FDSKMMYVDAYVNKADGMLVGPYSNYGPGFDYVGGTGWNSGKRKVDALDLFADGSYELFG 


394 




Query: 


375 


REHDLIAGINGYKYASNKYGER — S 1 1 PNAI PNAYEFSRTGAYPQPSS FAQT I PQYDTRR 


432 






R+H+L+ G Y +N+Y +1 P+ I + Y F+ G +PQ Q++ Q DT 




10 


Sbjct: 


395 


RQHNLMFG-GSYSKQNNRYFSSWANIFPDEIGSFYNFN— GNFPQTDWSPQSLAQDDTTH 


451 




Query: 


433 


QIGGYLATRFRAADNLSLILGGRYSRYRAGSYNSRTQGMTY-VSANRFTPYTGIVFDXXX 


491 








Y ATR AD L LILG RY+ +R + +TY + N TPY G+VFD 




15 


Sbjct: 


452 


MK S L YAATRVT LADPLHLI LGARYTNWRVDT LTYSMEKNHTT PYAGLVFDIND 


504 




Query: 


492 


XXXXXXXXXXXFVPQLQKDEHGSYLKPVTGNNLEADIKGEWLEGRLNASAAVYRARKNNL 


551 








F PQ +D G YL P+TGNN E +K +W+ RL + A++R ++N+ 






Sbjct: 


505 


NWSTYASYTSIFQPQNDRDSSGKYLAPITGNNYELGLKSDWMNSRLTTTLAIFRIEQDNV 


564 


20 


Query: 


552 


ATAAGR— DQSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKPRDQDGSRLN 


608 








A + G +G T Y+A + + G E E+ G IT WQ+ G ++ D +G+ +N 






Sbjct: 


565 


AQSTGTPIPGSNGETAYKAVDGTVSKGVEFELNGAITDNWQLTFGATRYIAEDNEGNAVN 


624 


25 




609 


PDSVPERSFKLFTAYHLAPEAPSGRTIGAGVRRQGETHTDPAALRIPNPAAKARAVANSR 


668 






P ++P + K+FT+Y L P P T+G GV Q +TD P RA 






Sbjct: 


625 


P-N LPRTTVKMFT S YRL- P VMPE - LTVGGGVNWQNRVYT DT V TPYGTFRA E 


672 




Query: 


669 


QKAYAVADIMARYRFNPRTELSLNVDNLFNKHYRTQPDRH-SYGALRTVNAAFTYRF 724 


30 






Q +YA+ D+ RY+ L NV+NLF+K Y T + YG R + TY+F 




Sbjct: 


673 


QGSYALVDLFTRYQVTKNFSLQGNVNNLFDKTYDTNVEGSIVYGTPRNFSITGTYQF 72 9 



Based on this analysis, it was predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



ORF23-1 (77.5kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
35 above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
15A shows the results of affinity purification of the His-fusion protein, and Figure 15B shows the 
results of expression of the GST-fusion in E.coli. Purified His-fusion protein was used to immunise 
mice, whose sera were used for Western blot (Figure 15C) and for ELISA (positive result). These 
experiments confirm that ORF23-1 is a surface-exposed protein, and that it is a useful immunogen. 



40 Example 80 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 673>: 

1 ATGCGCACGG CAGTGGTTTT GCTGTTGATC ATGCCGATGG CGGCTTCGTC 

51 GGCAATGATG CCGGAAATGG TGTGCGCGGG CGTGTCGCCG GGAACGGCAA 

101 TCATATCCAA GCCGACCGAA CAAACGGCGG TCATGGCTTC GAGTTTGTCC 

45 151 AGCGTCAgcA CGCCTGCTTC GGCGgcGgCa ATCATACCTT CGTCTTCGGA 

201 AACGGGGATA AACGcGCCAC TCAAACCCCC GACCGCGCTG GAAGCCATCA 

251 TGCCGCCTTT TTTCACGGCA TCGTTCAGCA ATGCCAAAGC TGCTGTTGTG 

301 CCGTGCGTAC CGCAGACGCT CAAGCCCATT TnTTCAAGAA TGCGTGCCAC 

351 TnAGTCGCCG ACGGGG . . 

50 This corresponds to the amino acid sequence <SEQ ID 674; ORF24>: 



1 MRTAWLLLI MPMAASSAMM PEMVCAGVSP GTAIISKPTE QTAVMASSLS 
51 SVSTPASAAA IIPSSSETGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAW 
101 PCVPQTLKPI XSRMRATXSP TG. . 
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Further work revealed the complete nucleotide sequence <SEQ ID 675>: 



ATGCGCACGG 
GGCAATGATG 
TCATATCCAA 
AGCGTCAGCA 
AACGGGGATA 
TGCCGCCTTT 
CCGTGCGTAC 
TGAGTCGCCG 
ACGGGATATT 
CGGGTAATTT 
TGTCGTTGCA 
ATACGCCGAC 
CCCGCCATAA 
AGCGCAGCCG 
CGCCCGCCAG 
ATATTGATGG 
GGAGCGGATT 
CGGAAAAACC 
AAAGTTTGCG 



CAGTGGTTTT 
CCGGAAATGG 
GCCGACCGAA 
CGCCTGCTTC 
AACGCGCCAC 
TTTCACGGCA 
CGCAGACGCT 
ACGGCGGGGG 
CAGCATTTTT 
TGAAAGCAGT 
TCTGAATTTT 
ATTGATAACG 
ACGGGTTGTC 
AAACCTTCGG 
CTTGACCGCA 
AGCTGCACAC 
AACACCTCAT 
GCCGATAAAA 
CCACGCTGAC 



GCTGTTGATC 
TGTGCGCGGG 
CAAACGGCGG 
GGCGGCGGCA 
TCAAACCCCC 
TCGTTCAGCA 
CAAGCCCATT 
TCGGCGCCAG 
GAGGCTTCGC 
TTTCTTCACT 
CCAACGCGGC 
GCATCCGCTT 
TTCCACCGCG 
GCCTGATTTC 
TCCATATTGA 
AATATCGGTA 
CCGAAGGCGA 
GACACACCGA 
GTAA 



ATGCCGATGG 
CGTGTCGCCG 
TCATGGCTTC 
ATCATACCTT 
GACCGCGCTG 
ATGCCAAAGC 
TCTTCAAGAA 
CGACAAGTCG 
GGCCGATGAG 
ACTTCCGCAA 
TTTTACGACA 
CGCCCGAACC 
TTGCAGAACA 
CGCCGTGCGT 
TACCGGCACG 
GTCTTCATCG 
CATCCCTTTT 
TGGCTTTGGC 



CGGCTTCGTC 
GGAACGGCAA 
GAGTTTGTCC 
CGTCTTCGGA 
GAAGCCATCA 
TGCTGTTGTG 
TGCGTGCCAC 
AGAATACCAA 
TTCGCCCACG 
CTTCGGTCAA 
CCTGGGCCGG 
ATGAAACGCG 
CGACAATTTT 
TTGACGGTTT 
CGTACTGCCG 
CTTCGGGAAT 
TGCACCAACG 
AGCTTTATCC 



This corresponds to the amino acid sequence <SEQ ID 676; ORF24-l>: 

1 MRTAWLLLI MPMAASSA MM PEMVCAGVSP GTAIISKPTE QTAVMASSLS 
51 SVSTPASAAA IIPSSSETGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAW 
101 PCVPQTLKPI SSRMRATESP TAGVGASDKS RIPNGIFSIF EASRPMSSPT 
151 RVILKAVFFT TSATSVNWA SEFSNAAFTT PGPDTPTLIT ASASPEP*NA 
201 PAINGLSSTA LQNTTILAQP KPSGVIS AVR LTVSPASLTA SILI PARVLP 
251 ILMELHTISV VFIA SGMERI NTSSEGDIPF CTNAEKPPIK DTPMALAALS 
301 KVCATLT* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted QRF from N. meningitidis (strain A) 

ORF24 shows 96.4% identity over a 307 aa overlap with an ORF (ORF24a) from strain A of N. 

meningitidis: 



MRTAVVLLLIMPMAASSAMMPEMVCAGVSPGTAI I SXPTEQTAVIASSLSNVSTPASAAA 

Ill I I I I I I I I I I I I I 11111:11111:1111 

MRTAWLLLIMPMAASSAMMPEKVCAGVSPGTAIISKPTEQTAVMASSLSSVSTPASAAA 



70 80 90 100 110 120 

Orf24a.pep I I PSSSXTGINAPLKPPTALEAIMPPFFTASFSNAKAAVVPCVPQTLKPI SSRMRATESP 

orf24 IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPISSRMRATESP 
70 80 90 100 110 120 



130 140 150 160 170 180 

orf2 4a.pep TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 

orf24 TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 
130 140 150 160 170 180 



190 200 210 220 230 240 

orf24a.pep PGPDTPTLITASASPEPXNAPAIXGLSSXALQNTTILAQPKPSSVISXVRLMVSPASLTA 
II I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I : I I I III I I I I I I I I 
orf24 PGPDTPTLITASASPEPXNAPAINGLSSTALQNTTILAQPKPSGVISAVRLTVSPASLTA 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf24a.pep SILIPARVLPILMELHTISWFIASGMERXNTSSEGDIPFCTSAEKPPIKDTPMALAALS 

OEf2 4 SILIPARVLPILMELHTISWFIASGMERINTSSEGDIPFCTNAEKPPIKDTPMALAALS 
250 260 270 280 290 300 
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orf24a.pep KVCATLTX 
orf24 KVCATLTX 

The complete length ORF24a nucleotide sequence <SEQ ID 677> is: 



ATGCGCACGG 
GGCAATGATG 
TCATATCCAA 
AACGTCAGCA 
NACGGGGATA 
TGCCGCCCTT 
CCGTGCGTAC 
CGAGTCGCCG 
ACGGGATATT 
CGGGTAATTT 
TGTCGTTGCA 
ATACGCCGAC 
CCCGCCATAN 
GGCGCAGCCG 
CGCCCGCCAG 
ATATTGATGG 
GGAACGGATN 
CGGAAAAGCC 
AAAGTTTGCG 



CAGTGGTTTT 
CCGGAAATGG 
NCCGACCGAA 
CGCCTGCTTC 
AACGCGCCAC 
TTTCACGGCA 
CGCAGACGCT 
ACGGCAGGGG 
CAGCATTTTT 
TGAAGGCGGT 
TCCGAATTTT 
ATTAATCACA 
ACGGGTTGTC 
AAACCTTCTA 
TCTGACCGCG 
AGCTGCACAC 
AACACCTCGT 
GCCAATAAAA 
CCACGCTGAC 



GCTGTTGATC 
TGTGCGCGGG 
CAAACGGCGG 
GGCGGCGGCA 
TCAAACCGCC 
TCGTTCAGCA 
CAAACCCATT 
TCGGTGCCAG 
GAGGCTTCGC 
TTTCTTCACA 
CCAACGCGGC 
GCATCCGCTT 
TTCCNCCGCG 
GTGTGATTTC 
TCCATATTGA 
GATAT CAGT A 
CAGAAGGCGA 
GACACGCCGA 
GTAA 



ATGCCGATGG 
TGTGTCGCCG 
TCATCGCTTC 
ATCATACCTT 
AACCGCGCTC 
ATGCCAAAGC 
TCTTCAAGAA 
CGACAAGTCG 
GGCCGATGAG 
ACTTCGGCAA 
TTTTACGACA 
CGCCTGAGCC 
TTGCAGAACA 
ANCCGTGCGT 
TACCGGCGCG 
GTCTTCATCG 
CATACCTTTT 
TGGCTTTGGC 



CGGCTTCGTC 
GGAACGGCAA 
GAGTTTATCC 
CGTCTTCGGA 
GAAGCCATCA 
TGCTGTTGTG 
TGCGCGCCAC 
AGAATACCAA 
TTCGCCCACG 
CTTCGGTCAA 
CCCGGGCCGG 
GTGAAACGCG 
CGACGATTTT 
TTGATGGTTT 
CGTACTGCCG 
CTTCGGGAAT 
TGCACCAGCG 
AGCCTTATCC 



This encodes a protein having amino acid sequence <SEQ ID 678>: 



1 MRTAWLLLI MPMAASSAMM PEMVCAGVSP GTAIISXPTE QTAVIASSLS 

51 NVSTPASAAA IIPSSSXTGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAVV 

101 PCVPQTLKPI SSRMRATESP TAGVGASDKS RIPNGIFSIF EASRPMSSPT 

151 RVILKAVFFT TSATSVNWA SEFSNAAFTT PGPDTPTLIT ASASPEP^NA 

201 PAIXGLSSXA LQNTTILAQP KPSSVISXVR LMVSPASLTA SILIPARVLP 

251 ILMELHTISV VFIASGMERX NTSSEGDIPF CTSAEKPPIK DTPMALAALS 

301 KVCATLT* 

It should be noted that this protein includes a stop codon at position 198. 



ORF24a and ORF24-1 show 96.4% identity in 307 aa overlap: 

10 20 30 40 50 60 

orf24a.pep MRTAVVLLLIMPMAASSAMMPEMVCAGVSPGTAIISXPTEQTAVIASSLSNVSTPASAAA 

I I I I I I I I I I I I I : I I I I I : I I I 

orf24-l MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIISKPTEQTAVMASSLSSVSTPASAAA 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf2 4a.pep 1 1 PS SSXTGINAPLKPPTALEAIMPPFFTASFSNAKAAVVPCVPQTLKPI SSRMRATESP 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I 

orf24-l I I PS SSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPI SSRMRATESP 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf24a.pep TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 

I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I 

orf24-l TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf24a.pep PGPDTPTLITASASPEPXNAPAIXGLSSXALQNTTILAQPKPSSVISXVRLMVSPASLTA 

MINIM MM llh IMIII Ml I I I I I I I I 

orf24-l PGPDTPTLITASASPEPXNAPAING1SSTALQNTTILAQPKPSGVISAVRLTVSPASLTA 
190 200 210 220 230 240 

250 260 270 280 290 300 

orf24a.pep SILIPARVLPILMELHTISWFIASGMERXNTSSEGDIPFCTSAEKPPIKDTPMALAALS 
I I I I I II I M II I I I I I I I I I II I I I I II I I II I II I I II I : II II I II II I I I I II I I 
orf24-l SILIPARVLPILMELHTISWFIASGMERINTSSEGDIPFCTNAEKPPIKDTPMALAALS 
250 260 270 280 290 300 
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orf24a.pep KVCATLTX 
I I I I II I I 

or f 24-1 KVCATLTX 



Homology with a predicted ORF from N. gonorrhoeae 

ORF24 shows 96.7% identity over a 121 aa overlap with a predicted ORF (ORF24ng) from 
N .gonorrhoeae: 



orf24ng 
orf24 .pep 
orf24ng 
orf24 .pep 
orf24ng 



MRTAWLLLIMPMAASSAMMPSMVCAGVSPGTAIISKPTEQTAVMASSLSSVSTPASAAA 

Ill 1111:111 I I I I I I I : I I I I I I I 

MRTAVVLLLIMPMAASSAMMPSMVCAGVSPGTAIMSKPTEQTAVMASSLSSVNTPASAAA 

IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPIXSRMRATXSP 

IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPISSRMRATESP 



120 
120 



TAGVGASDKSRMPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVRLTASEFSSAALTT 180 



The complete length ORF24ng nucleotide sequence <SEQ ID 679> is: 



ATGCGCACGG 
GGCGATGATG 
TCATGTCCAA 
AGCGTCAACA 
AACGGGGATA 
TGCCGCCCTT 
CCGTGCGTAC 
CGAGTCGCCG 
ACGGGATATT 
CGGGTGATTT 
GCTGACCGCG 
ATACGCCGAC 
CCCGCCATAA 
GGCGCAGCCG 
CGCCTGCCAG 
ATATTGATGG 
GGAACGGATC 
CGGAAAAGCC 
AAAGTCTGCG 



CGGTGGTTTT 
CCGGAAATGG 
ACCAACGGAG 
CGCCTGCCTC 
AACGCGCCGC 
TTTCACGGCA 
CGCAGACGCT 
ACGGCGGGGG 
CAGCATTT7T 
TGAAAGCGGT 
TCCGAATTTT 
ATTAATCACA 
ACGGATTGTC 
AAACCTTCGG 
CTTGACCGCA 
AGCTGCACAC 
AACACCTCAT 
GCCGATAAAG 
CCACGCTGAC 



GCTGTTGATC 
TGTGCGCGGG 
CAGACGGCGG 
GGCGGCGGCA 
TCAAACCGCC 
TCGTTCAGCA 
CAAGCCCATT 
TCGGTGCCAG 
GAGGCTTCGC 
TTTCTTCACG 
CCAGCGCGGC 
GCATCCGCTT 
TTCCACCGCG 
GTGTGATTTC 
TCCATATTGA 
GATATCGGTA 
CCGAAGGCGA 
GACACGCCGA 
ATAA 



ATGCCGATGG 
CGTGTCGCCG 
TCATGGCTTC 
ATCATACCTT 
GACCGCGCTG 
ATGCCAAAGC 
TCTTCAAGAA 
CGACAAATCG 
GACCGATGAG 
ACTTCGGCGA 
TTTGACCACG 
CGCCCGAGCC 
TTGCAGAACA 
AGCCGTGCGT 
TACCGGCACG 
GTTTTCATCG 
CATACCTTTT 
TGGCTTTGGC 



CGGCTTCGTC 
GGAACGGCAA 
GAGTTTGTCC 
CGTCTTCGGA 
GAAGCCATCA 
TGCTGTTGTG 
TGCGCGCCAC 
AGAATGCCGA 
TTCGCCCACG 
CCTCGGTCAG 
CCTGGACCGG 
GTGGAACGCA 
CGACGATTTT 
TTGATGGTTT 
CGTGCTGCCG 
CTTCGGGAAC 
TGCACCAGCG 
TGCCTTGTCC 



This encodes a protein having amino acid sequence <SEQ ID 680>: 



1 MRTAVVLLLI MPMAASSA MM PEMVCAGVSP GTAIMSKPTE QTAVMASSLS 

51 SVNTPASAAA IIPSSSETGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAW 

101 PCVPQTLKPI SSRMRATESP TAGVGASDKS RMPNGIFSIF EASRPMSSPT 

151 RVILKAVFFT TSATSVRLTA SEFSSAALTT PGPDTPTLIT ASASPEPWNA 

201 PAINGLSSTA LQNTTILAQP KPSGVIS AVR LMVSPASLTA SILI PARVLP 

251 ILMELHTISV VFIA SGTERI NTSSEGDIPF CTSAEKPPIK DTPMALAALS 

301 KVCATLT* 

ORF24ng and ORF24-1 show 96.1% identity in 307 aa overlap: 

10 20 30 40 50 60 

orf 24-1. pep MRTAWLLLIMPMAAS S AMKPEMVCAGVS PGTAI I SKPTEQTAVMAS SLS SVST PASAAA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I II I I I I I I I I I I : I I I I I I I 
O r f 2 4 ng MRTAWLLLIMPMAASS AMMPEMVCAGVS PGTAIMSKPTEQTAVMAS SLS SVNT PASAAA 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 24-1. pep I I PS SSETGINAPLKFPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPI SSRMRATESP 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf24ng I I PS SSETGINAPLKPP7ALEAIMPPFFTASFSNAKAAVVPCVPQTLKPI SSRMRATESP 

70 80 90 100 110 120 
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130 140 150 160 170 180 

orf 24-1. pep TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 

I 1:111111 II I I I I I I I I I I : : I I I I I : II : I I 

orf24ng TAGVGASDKSRMPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVRLTASEFSSAALTT 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 24-1. pep PGPDTPTLITASASPSPXNAPAINGLSSTALQNTTILAQPKPSGVISAVRLTVSPASLTA 

I I I I I I I I I I I I I I I II I I I I I I I I I I I I 

orf24ng PGPDTPTLITASAS P3PWNAPAINGLSSTALQNTTILAQPKPSGVISAVRLMVSPASLTA 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 24-1. pep SILIPARVLPILMELHTISVVFIASGMERINTSSEGDIPFCTNAEKPPIKDTPMALAALS 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II : I I I I I I I I I ! I I I I I I I 

orf24ng SILIPARVLPILMELHTISVVFIASGTERINTSSEGDIPFCTSAEKPPIKDTPMALAALS 

250 260 270 280 290 300 



orf 2 4-1. pep KVCATLTX 

I I I I I I I 
orf2 4ng KVCATLTX 

Based on this analysis, including the presence of a putative leader sequence (first 18 aa - double- 
underlined) and putative transmembrane domains (single-underlined) in the gonococcal protein, 
it is predicted that the proteins from 7Y. meningitidis and N. gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 81 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 681>: 

1 . . ACCGACGTGC AAAAAGAGTT GGTCGGCGAA CAACGCAAGT GGGCGCAGGA 

51 AAAAATCAGC AACTGCCGAC AAGCCGCCGC GCAGGCAGAC CGGCAGGAAT 

101 ACGCCGAATA CCTCAAGCTG CAATGCGACA CGCGGATGAC GCGCGAACGG 

151 ATACAGTATC TTCGCGGCTA TTCCATCGAT TAG 

This corresponds to the amino acid sequence <SEQ ID 682; ORF25>: 



Further work revealed the complete nucleotide sequence <SEQ ID 683>: 

1 ATGTATCGGA AACTCATTGC GCTGCCGTTT GCCCTGCTGC TTGCCGCTTG 

51 CGGCAGGGAA GAACCGCCCA AGGCATTGGA ATGCGCCAAC CCCGCCGTGT 

101 TGCAAGGCAT ACGCGGCAAT ATTCAGGAAA CGCTCACGCA GGAAGCGCGT 

151 TCTTTCGCGC GCGAAGACGG CAGGCAGTTT GTCGATGCCG ACAAAATTAT 

201 CGCCGCCGCC TACGGTTTGG CGTTTTCTTT GGAACACGCT TCGGAAACGC 

251 AGGAAGGCGG GCGCACGTTC TGTATCGCCG ATTTGAACAT TACCGTGCCG 

301 TCTGAAACGC TTGCCGATGC CAAGGCAAAC AGCCCCCTGT TGTACGGGGA 

351 AACTGCTTTG TCGGATATTG TGCGGCAGAA GACGGGCGGC AATGTCGAGT 

4 01 TTAAAGACGG CGTATTGACG GCAGCCGTCC GCTTCCTGCC CGTCAAAGAC 

451 GGTCAGACGG CATTTGTCGA CAACACGGTC GGTATGGCGG CGCAAACGCT 

501 GTCTGCCGCG CTGCTGCCTT ACGGCGTGAA GAGCATCGTG ATGATAGACG 

551 GCAAGGCGGT GAAAAAAGAA GACGCGGTCA GGATTTTGAG CGGAAAAGCC 

601 CGTGAAGAAG AACCGTCCAA ACCCACGCCC GAAGACATTT TGGAACACAA 

651 TGCCGCCGGC GGCGATGCGG GCGTACCCCA AGCCGCAGAA GGCGCGCCCG 

7 01 AACCGGAAAT CCTGCATCCT GACGACGGCG AGCGTGCCGA TACCGTTACC 

751 GTATCACGGG GCGAAGTGGA AGAGGCGCGC GTACAAAACC AGCGTGCGGA 

801 ATCCGAAATT ACCAAACTTT GGGGAGGACT CGATACCGAC GTGCAAAAAG 

851 AGTTGGTCGG CGAACAACGC AAGTGGGCGC AGGAAAAAAT CAGCAACTGC 

901 CGACAAGCCG CCGCGCAGGC AGACCGGCAG GAATACGCCG AATACCTCAA 

951 GCTGCAATGC GACACGCGGA TGACGCGCGA ACGGATACAG TATCTTCGCG 

1001 GCTATTCCAT CGATTAG 
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This corresponds to the amino acid sequence <SEQ ID 684; ORF25-l>: 

1 MYRKLIALPF ALLLAA CGRE EPPKALECAN PAVLQGIRGN IQETLTQEAR 

51 SFAREDGRQF VDADKIIAAA YGLAFSLEHA SETQEGGRTF CIADLNITVP 

101 SETLADAKAN 3PLLYGETAL SDIVRQKTGG NVEFKDGVLT AAVRFLPVKD 

151 GQTAFVDNTV GMAAQTLSAA LLPYGVKSIV MI DGKAVKKE DAVRILSGKA 

201 REEEPSKPTP EDILEHNAAG GDAGVPQAAE GAPE PE I LHP DDGERADTVT 

251 VSRGEVEEAR VQNQRAESEI TKLWGGLDTD VQKELVGEQR KWAQEKISNC 

301 RQAAAQADRQ EYAEYLKLQC DTRMTRERIQ YLRGYSID* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF25 shows 98.3% identity over a 60aa overlap with an ORF (ORF25a) from strain A of N. 
meningitidis: 

10 20 30 

orf 25 . pep TDVQKELVGEQRKWAQEKISNCRQAAAQAD 

I I I I I ! I I I I I 

orf 25a VTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEXRKWAQEKISNCRQAAAQAD 
250 260 270 280 290 300 

40 50 60 

orf 25 . pep RQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 

orf 25a RQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 
310 320 330 

The complete length ORF25a nucleotide sequence <SEQ ID 685> is: 

1 ATGTATCGGA AACTCATTGC GCTGCCGTTT GCCCTGCTGC TTGCCGCTTG 

51 CGGCAGGGAA GAACCGCCCA AGGCATTGGA ATGCGCCAAC CCCGCCGTGT 

101 TGCAANGCAT ACGCNGCAAT ATTCAGGAAA CGCTCACGCA GGAAGCGCGT 

151 TCTTTCGCGC GCGAAGACNG CANGCAGTTT GTCGATGCCG ACNAAATTAT 

201 CGCCGCCGCC TANGNTNNGN NGNTNTCTTT GGAACACGCT TCGGAAACGC 

251 AGGAAGGCGG GCGCACGTTC TGTNTCGCCG ATTTGAACAT TACCGTGCCG 

301 TCTGAAACGC TTGCCGATGC CAAGGCAAAC AGCCCCCTGC TGTACGGGGA 

351 AACCGCTTTG TCGGATATTG TGCGGCAGAA GACGGGCGGC AATGTCGAGT 

401 TTAAAGACGG CGTATTGACG GCAGCCGTCC GCTTCCTACC CGTCAAAGAC 

4 51 GGTCAGANGG CATTTGTCGA CAACACGGTC GGTATGGCGG CGCAAACGCT 

501 GTCTGCCGCG TTGCTGCCTT ACGGCGTGAA GAGCATCGTG ATGATAGACG 

551 GCAAGGCGGT AAAAAAAGAA GACGCGGTCA GGATTNTGAG CNGANAAGCC 

601 CGTGAANAAG AACCGTCCAA ANCCNNGCCC GAAGACATTT TGGAACATAA 

651 TGCCGCCGGA GGGGATGCAG ACGTACCCCA AGCCGGAGAA GACGCGCCCG 

701 AACCGGAAAT CCTGCATCCT GACGACGGCG AGCGTGCCGA TACCGTTACC 

751 GTATCACGGG GCGAAGTGGA AGAGGCGCGN GTACAAAACC AGCGTGCGGA 

801 ATCCGAAATT ACCAAACTTT GGGGAGGACT CGATACCGAC GTGCAAAAAG 

851 AGTTGGTCGG CGAANAACGC AAGTGGGCGC AGGAAAAAAT CAGCAACTGC 

901 CGACAAGCCG CCGCGCAGGC AGACCGGCAG GAATACGCCG AATACCTCAA 

951 GCTGCAATGC GACACGCGGA TGACGCGCGA ACGGATACAG TATCTTCGCG 

1001 GCTATTCCAT CGATTAG 

This encodes a protein having amino acid sequence <SEQ ID 686>: 

1 MYRKLIALPF ALLLAA CGRE EPPKALECAN PAVLQXIRXN IQETLTQEAR 

51 SFAREDXXQF VDADXIIAAA XXXXXSLEHA SETQEGGRTF CXADLNITVP 

101 SETLADAKAN SPLLYGETAL SDIVRQKTGG NVEFKDGVLT AAVRFLPVKD 

151 GQXAFVDNTV GMAAQTLSAA LLPYGVKSIV MI DGKAVKKE DAVRIXSXXA 

201 REXEPSKXXP EDILEHNAAG GDADVPQAGS DAPE PEILHP DDGERADTVT 

251 VSRGEVEEAR VQNQRAESEI TKLWGGLDTD VQKELVGEXR KWAQEKISNC 

301 RQAAAQADRQ EYAEYLKLQC DTRMTRERIQ YLRGYSID* 

ORF25a and ORF25-1 show 93.5% identity in 338 aa overlap: 

10 20 30 40 50 60 

orf 25a . pep MYRKLIALPFALLLAACGREEPPKALECANPAVLQXIRXNIQETLTQEARSFAREDXXQF 
I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I II I I II I II 
Orf 25-1 MYRKLIALPFALLLAACGREEPPKALECANPAVLQGIRGNIQETLTQEARSFAREDGRQF 
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10 20 30 40 50 60 

70 80 90 100 110 120 

orf25a.pep VDADXIIAAAXXXXXSLEHASETQEGGRTFCXADLNITVPSETLADAKANSPLLYGETAL 

I I I I I I I I I I I I I I I I I I I I I I I I I I Ill 

orf25-l VDADKIIAAAYGLAFSLEHASETQEGGRTFCIADLNITVPSETLADAKANSPLLYGETAL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 25a . pep SDIVRQKTGGNVEFKDGVLTAAVRFLPVKDGQXAFVDNTVGMAAQTLSAALLPYGVKSIV 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I : I I I I I I I I I I I I I I I I I I I I II I I I I I 
orf 25-1 SDIVRQKTGGNVE FKDGVLTAAVRFLPVKDGQTAFVDNTVGMAAQTLSAALLPYGVKSIV 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 2 5a. pep MIDGKAVKKE DAVRIXSXXAREXEPSKXXPEDILEHNAAGGDADVPQAGEDAPEPEILHP 

I I I I I I I I I I I I I I I I III I I I I : I II I I I I I I I I I I I I I I I : I I I I I I I I I I 
orf 25-1 MIDGKAVKKEDAVRILSGKAREEEPSKPTPEDILEHNAAGGDAGVPQAAEGAPEPEILHP 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 25a . pep DDGERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEXRKWAQEKISNC 
I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 25-1 DDGERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKISNC 

250 260 270 280 290 300 



310 320 330 339 

orf 25a. pep RQAAAQADRQEYAE YLKLQCDTRMTRERIQYLRGYS I DX 

I II I I I I I I I I I I I I I I I I I I I 

orf 25-1 RQAAAQADRQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 

310 320 330 



Homology with a predicted ORF from N. gonorrhoeae 

ORF25 shows 100% identity over a 60aa overlap with a predicted ORF (ORF25ng) from 
N. gonorrhoeae: 



orf 25. pep TDVQKELVGEQRKWAQEKISNCRQAAAQAD 30 

orf25ng VTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKISNCRQAAAQAD 308 

orf 25. pep RQEYAEYLKLQCDTRMTRERIQYLRGYSID 60 

I I I I I I II 

orf25ng RQEYAEYLKLQCDTRMTRERIQYLRGYSID 338 



The complete length ORF25ng nucleotide sequence <SEQ LD 687> is: 



1 


ATGTATCGGA 


AACTCATTGC 


GCTGCCGTTT 


GCCCTGCTGC 


TTGCAGCGTG 


51 


CGGCAGGGAA 


GAACCGCCCA 


AGGCGT-GGA 


ATGCGCCAAC 


CCCGCCGTGT 


101 


TGCAGGACAT 


ACGCGGCAGT 


ATTCAGGAAA 


CGCTCACGCA 


GGAAGCGCGT 


151 


TCTTTCGCGC 


GCGAAGACGG 


CAGGCAGTTT 


GTCGATGCCG 


ACAAAATTAT 


201 


CGCCGCCGCC 


TACGGTTTGG 


CGTTTTCTTT 


GGAACACGCT 


TCGGAAACGC 


251 


AGGAAGGCGG 


GCGCACGTTC 


TGTATCGCCG 


ATTTGAACAT 


TACCGTGCCG 


301 


TCTGAAACGC 


TTGCCGATGC 


CGAGGCAAAC 


AGCCCCCTGC 


TGTATGGGGA 


351 


AACGTCTTTG 


GCAGACATCG 


TGCAGCAGAA 


GACGGGCGGC 


AATGTCGAGT 


401 


TTAAAGACGG 


CGTATTGACG 


GCAGCCGTCC 


GCTTCCTGCC 


CGCCAAAGAC 


451 


GCTCGGACGG 


CATTTATCGA 


CAACACGGTC 


GGTATGGCGA 


CGCAAACGCT 


501 


GTCTGCCGCG 


TTGCTGCCTT 


ACGGCGTGAA 


GAGCATCGTG 


AT GATAGAC G 


551 


GCAAGGCGGT 


GACAAAAGAA 


GACGCGGTCA 


GGGTTTTGAG 


CGGCAAAGCC 


601 


CGTGAAGAAG 


AACCGTCCAA 


ACCCACCCCC 


GAAGACATTT 


TGGAACACAA 


651 


TGCCGCCGGC 


GGCGATGCGG 


GCGTACCCCA 


AGCCGCAGAA 


GGCGCACCCG 


701 


AACCCGAAAT 


CCTGCATCCC 


GACGACGTCG 


AGCGTGCCGA 


TACCGTTACC 


751 


GTATCACGGG 


GCGAAGTGGA 


AGAGGCGCGC 


GTACAAAACC 


AACGTGCGGA 


801 


ATCCGAAATT 


ACCAAACTTT 


GGGGAGGACT 


CGATACCGAC 


GTGCAAAAAG 


851 


AGTTGGTCGG 


CGAACAGCGC 


AAGTGGGCGC 


AGGAAAAAAT 


CAGcaactgc 


901 


cgACAAGCCG 


CCGCGCAGGC 


AGACCGGCAG 


GAATACGCCG 


AATACCTCAA 


951 


GCTCCAATGC 


GACACGCGGA 


TGACGCGCGA 


ACggaTACAG 


TATCTTCGCG 


1001 


GCTATTCCAT 


CGATTAG 









This encodes a protein having amino acid sequence <SEQ ID 688>: 
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MYRKLIALPF ALLLAA CGRE EPPKALECAN PAVLQDIRGS IQETLTQEAR 
SFAREDGRQF VDADKIIAAA YGLAFSLEHA SETQEGGRTF CIADLNITVP 
SETLADAEAN SPLLYGETSL ADIVQQKTGG NVEFKDGVLT AAVRFLPAKD 
ARTAFIDNTV GMATQTLSAA LLPYGVKSIV MIDGKAVTKE DAVRVLSGKA 
REEEPSKPTP EDILEHNAAG GDAGVPQAAE GAPEPEILHP DDVERADTVT 
VSRGEVEEAR VQNQRAESEI TKLWGGLDTD VQKELVGEQR KWAQEKISNC 
RQAAAQADRQ EYAEYLKLQC DTRMTRERIQ YLRGYSID* 



ORF25ng and ORF25-1 show 95.9% identity in 338 aa overlap: 



orf 25-1. pep 

orf25ng 



MYRKLIALPFALLLAACGREEPPKA1ECAN PAVLQGIRGNIQETLTQEARSFAREDGRQF 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I 
MYRKLIALPFALLLAACGREEPPKALECANPAVLQDIRGSIQETLTQEARSFAREDGRQF 



orf 25-1 .pep 
orf25ng 



VDADKI IAAAYGLAFSLEHASETQEGGRTFCIADLNITVPSETLADAKANSPLLYGETAL 
VDADKIIAAAYGLAFSLEHASETQEGGRTFCIADLNITVPSETLADAEANSPLLYGETSL 



orf 25-1 .pep 
orf25ng 



SDIVRQKTGGNVEFKDGVLTAAVRFLPVKDGQTAFVDNTVGMAAQTLSAALLPYGVKSIV 
ADIVQQKTGGNVEFKDGVLTAAVRFLPAKDARTAFIDNTVGMATQTLSAALLPYGVK5IV 



orf 25-1 .pep 
orf25ng 



MIDGKAVKKEDAVRILSGKAREEEPSKPTPEDILEHNAAGGDAGVPQAAEGAPEPEILHP 
I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I 
MIDGKAVTKEDAVRVLSGKAREEEPSKPTPEDILEHNAAGGDAGVPQAAEGAPEPEILHP 



orf 25-1 .pep 
orf 25ng 



DDGERADTVT VSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKI SNC 



orf25-l .pep 
orf25ng 



RQAAAQADRQE YAE YLKLQCDTRMTRER I QYLRGYSIDX 
I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
RQAAAQADRQE YAE YLKLQCDTRMTRERIQYLRGYS I DX 



Based on this analysis, including the presence of a predicted prokaryotic membrane lipoprotein 
lipid attchment site (underlined) in the gonococcal protein, it was predicted that the proteins from 
N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 

ORF25-1 (37kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
16A shows the results of affinity purification of the GST-fusion protein, and Figure 16B shows the 
results of expression of the His-fusion in E.coli. Purified His-fusion protein was used to immunise 
mice, whose sera were used for Western blot (Figure 16C), ELISA (positive result), and FACS 
analysis (Figure 16D). These experiments confirm that ORF25-1 is a surface-exposed protein, and 
that it is a useful immunogen. 
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Figure 16E shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF25-1. 



Example 82 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 689> 

1 ATGCAGCTGA TCGACTATTC ACATTCATTT TTCTCGGTTG TGCCACCCTT 

51 TTTGGCACTG GCACTTGCCG TCATTACCCG CCGCGTACTG CTGTCTTTAG 

101 GCATCGGTAT TCTGGwysGC GTTGCCTTTT TGGTCGGCGG CAACCCCGTC 

151 GACGGTCTGA CACACCTGAA AGACATGGTC GTCGGCTTGG CTTGGTCAGA 

201 CGsyGATTGG TCGCTGGGCA AACCAAAAAT CTTGGTTTTC CkGATACTTT 

251 TGGGTATTTT TACTTCCCTG CTGACCTACT CCGGCAGCAA T 

// 

851 AC TTCGCTGGTA 

901 TTCGGCGGCA CTTGCGGCGT CTTTGCCGTC GTTCTCTGCA CGCTCGGCAC 

951 GATTAAAACC GCCGACTATC CCAAAGCCGT TTGGCAGGGT GCGAAATCTA 

1001 TGTTCGGCGC AATCGCCATT TTAATCCTCG CTTGGCTCAT CAGTACGGTT 

1051 GTCGGCGAAA TGCACACCGG CGATTACCTC TCCACACTGG TTGCGGGCAA 

1101 CATCCATCCC GGCTTCCTGC CCGTCATCCT CTTCCTGCTC GCCAGCGTGA 

1151 TGGCGTTTGC CACAGGCACA AGCTGGGGGA CGTTCGGCAT TATGCTGCCG 

1201 ATTGCCGCCG CCATGGCGGT CAAAGTCGAA CCCGCGCTGA TTATCCCGTG 

1251 TATGTCCGCA GTAATGGCGG GGGCGGTATG CGGCGACCAC TGCTCGCCCA 

1301 TTTCCGACAC GACCATCCTG TCGTCCACCG GCGCGCGCTG CAACCACATC 

1351 GACCACGTTA CCTCGCAACT GCCTTACGCC TTAACCGTTG CCGCCGCCGC 

1401 CGCATCGGGC TACCTCGCAT TGGGTCTGAC AAAATCCGCG CTGTTGGGCT 

14 51 TTGGCACGAC AGGCATTGTA TTGGCGGTGC TGATTTTTCT GTTGAAAGAT 

1501 AAAAAA. . 

This corresponds to the amino acid sequence <SEQ ID 690; ORF26>: 

1 MQLIDYSHSF FSWPPFLAL ALAVITRRVL LSLGIGILXX VAFLVGGNPV 

51 DGLTHLKDMV VGLAWSDXDW SLGKPKILVF XILLGIFTSL LTYSGSN . . . 

// 

251 TSLV 

301 FGGTCGVFAV VLCTLGTIKT ADYPKAVWQG AKSMFGAIAI LILAWLISTV 

351 VGEMHTGDYL STLVAGNIHP GFLPVILFLL ASVMAFATGT SWGTFGIMLP 

401 IAAAMAVKVE PALIIPCMSA VMAGAVCGDH CSPISDTTIL SSTGARCNHI 

451 DHVTSQLPYA LTVAAAAASG YLALGLTKSA LLGFGTTGIV LAVLIFLLKD 

501 KK. . 

Further work revealed the complete nucleotide sequence <SEQ ID 691>: 

1 ATGCAGCTGA TCGACTATTC ACATTCATTT TTCTCGGTTG TGCCACCCTT 

51 TTTGGCACTG GCACTTGCCG TCATTACCCG CCGCGTACTG CTGTCTTTAG 

101 GCATCGGTAT TCTGGTCGGC GTTGCCTTTT TGGTCGGCGG CAACCCCGTC 

151 GACGGTCTGA CACACCTGAA AGACATGGTC GTCGGCTTGG CTTGGTCAGA 

201 CGGCGATTGG TCGCTGGGCA AACCAAAAAT CTTGGTTTTC CTGATACTTT 

251 TGGGTATTTT TACTTCCCTG CTGACCTACT CCGGCAGCAA TCAGGCGTTT 

301 GCCGACTGGG CAAAACGGCA CATTAAAAAC CGGCGCGGCG CGAAAATGCT 

351 GACCGCCTGC CTCGTGTTCG TAACCTTTAT CGACGACTAT TTCCACAGTC 

4 01 TCGCCGTCGG TGCGATTGCC CGCCCCGTTA CCGACAAGTT TAAAGTTTCC 

451 CGCACCAAAC TCGCCTACAT CCTCGACTCC ACTGCCGCTC CTATGTGCGT 

501 GCTGATGCCC GTTTCAAGCT GGGGCGCGTC GATTATCGCC ACGCTTGCCG 

551 GACTGCTCGT TACCTACAAA ATCACCGAAT ACACGCCGAT GGGGACGTTT 

601 GTCGCCATGA GCCTGATGAA CTATTACGCA CTGTTTGCCC TGATTATGGT 

651 GTTCGTCGTC GCATGGTTTT CCTTCGACAT CGGCTCGATG GCACGTTTCG 

701 AACAAGCCGC GTTGAACGAA GCCCACGATG AAACTGCCGT TTCAGACGCT 

7 51 ACCAAAGGTC GTGTTTACGC ACTGATTATT CCCGTTTTGG CCTTAATCGC 

801 CTCAACGGTT TCCGCCATGA TCTACACCGG CGCGCAGGCA AGCGAAACCT 

851 TCAGCATTTT GGGGGCATTT GAAAACACGG ACGTAAACAC TTCGCTGGTA 

901 TTCGGCGGCA CTTGCGGCGT CCTTGCCGTC GTTCTCTGCA CGCTCGGCAC 

951 GATTAAAACC GCCGACTATC CCAAAGCCGT TTGGCAGGGT GCGAAATCTA 

1001 TGTTCGGCGC AATCGCCATT TTAATCCTCG CTTGGCTCAT CAGTACGGTT 

1051 GTCGGCGAAA TGCACACCGG CGATTACCTC TCCACACTGG TTGCGGGCAA 

1101 CATCCATCCC GGCTTCCTGC CCGTCATCCT CTTCCTGCTC GCCAGCGTGA 

1151 TGGCGTTTGC CACAGGCACA AGCTGGGGGA CGTTCGGCAT TATGCTGCCG 

1201 ATTGCCGCCG CCATGGCGGT CAAAGTCGAA CCCGCGCTGA TTATCCCGTG 

1251 TATGTCCGCA GTAATGGCGG GGGCGGTATG CGGCGACCAC TGCTCGCCCA 

1301 TTTCCGACAC GACCATCCTG TCGTCCACCG GCGCGCGCTG CAACCACATC 
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1351 GACCACGTTA CCTCGCAACT GCCTTACGCC TTAACCGTTG CCGCCGCCGC 

1401 CGCATCGGGC TACCTCGCAT TGGGTCTGAC AAAATCCGCG CTGTTGGGCT 

1451 TTGGCACGAC AGGCATTGTA TTGGCGGTGC TGATTTTTCT GTTGAAAGAT 

1501 AAAAAACGCG CCAACGCCTG A 

This corresponds to the amino acid sequence <SEQ ID 692; ORF26-l>: 

1 MQLIDYSHSF FSWPPFLAL A LAVITRR VL LSLGIGILVG VAFLV GGNPV 

51 DGLTHLKDMV VGLAWSDGDW SLGKPK ILVF LILLGIFTSL LTY SGSNQAF 

101 ADWAKRHIKN R RGAKMLTAC LVFVTFID DY FHSLAVGAIA RPVTDKFKVS 

151 RTKLAYILDS TAAPMCVLMP VSSWGASIIA TLAGLLV TYK ITEYTPMGTF 

201 VAMSLMNYYA LFALIMVFW AWFSFDI GSM ARFEQAALNE AHDETAVSDA 

251 TKGRVY ALII PVLALIASTV SAMI YTGAQA SETFSILGAF ENTDVNTSLV 

301 FGGTCGVLAV VLCTL GTIKT ADYPKAVWQG AKSM FGAIAI LILAWLISTV 

351 VGEMHTGDYL STLVAGNIHP GFLPVILFLL ASVMAFA TGT SW GTFGIMLP 

401 IAAAMAVKV E P ALIIPCMSA VMAGAVCG DH CSPISDTTIL SSTGARCNHI 

451 DHVTSQLPY A LTVAAAAASG YLALGL TKSA LLGFGTTGIV LAVLIFL LKD 

501 KKRANA* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with the hypothetical transmembrane protein HI 1586 of H.influenzae (accession number P44263) 
ORF26 and HI 1586 show 53% and 49% amino acid identity in 97 and 221 aa overlap at the 
N-terminus and C-terminus, respectively: 

Orf2 6 1 MQLIDYSHSFFSVVPPFLALALAVITRRVXXXXXXXXXXXVAFLVGGNPVDGLTHLKDMV 60 

M+LID+S S +S+VP LA+ LA+ TRRV L +L V 

HI1586 14 MELIDFSSSVWSIVPALLAIILAIATRRVLVSLSAGIIIGSLMLSDWQIGSAFNYLVKNV 73 

Orf2 6 61 VGLAWSDXDWSLGKPKILVFXILLGIFTSLLTYSGSN 97 

V L ++D + + I++F +LLG+ T+LLT SGSN 

HI1586 74 VSLVYADGEIN-SNMNIVLFLLLLGVLTALLTVSGSN 109 

// 

Orf26 86 IFTSLLTYSGS — NTSLVFGGTCGVFAWLCTL — GTIKTADYPKAVWQGAKSMFGXXXX 141 

+F+ L T+ + TSLV GG C + L + + +Y ++ G KSM G 
HI1586 299 VFSVLGTFENTWGTSLWGGFCSIIISTLLI ILDRQVSVPEYVRSWIVGIKSMSGAIAI 358 

Orf26 142 XXXXXXXSTVVGEMHTGDYLSTLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLP 201 

+ +VG+M TG YLS+LV+GNI FLPVILF+L + MAF+TGTSWGTFGIMLP 
HI1586 359 LFFAWTINKIVGDMQTGKYLSSLVSGNIPMQFLPVILFVLGAAMAFSTGTSWGTFGIMLP 418 

Orf26 202 IAAAMAVKVE PAL 1 1 PCMSAVMAGAVCGDHCS PI S DTT I LS STGARCNHI DHVTSQXXXX 261 

IAAAMA P L++PC+SAVMAGAVCGDHCSP+SDTTILSSTGA+CNHIDHVT+Q 
HI1586 419 IAAAMAANAAPELLLPCLSAVMAGAVCGDHCSPVSDTTILSSTGAI^CNHIDHVTTQLPYA 478 

Orf26 262 XXXXXXXXXXXXXXXXXKSALLGFGTTGIVLAVLIFLLKDK 302 

S L GF T + L V+IF +K + 
HI1586 479 ATVATAT S IG YI WGFT YSGLAGFAATAVSLIVI IFAVKKR 519 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF26 shows 58.2% identity over a 502aa overlap with an ORF (ORF26a) from strain A of N. 
meningitidis: 



orf26.pep 

orf26a 



MQLIDYSHSFFSWPPFLALA LAVITRR VLLSLGIGILXXVAFLV GGNPVDGLTHLKDMV 
MQLIDYSHSFFSWPPFLALA LAVITRR VLLSLGIGILVGVAFLV GGNPVDGLTHLKDMV 



rf26.pep 
rf26a 



VGLAWSDXDWSLGKPK ILVFXILLGIFTSLLTY SGSNXX- 



WO 99/24578 



-396- 



PCT/IB98/01665 



or f 2 6. pep 

or f 2 6a LVFVTFID DYFHSLAVGAXARPVTDKFKVSRAKLAYILDSTAAPMCVLMP VSSWGASIIA 
130 140 150 160 170 180 



or f 2 6. pep 
orf26a 



250 260 270 280 290 300 

120 130 140 150 160 170 

FGGTCGVFAWLCTL GTIKTADYPKAVWQGAKSM FGAIAILILAWLISTW GEMHTGDYL 
I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
FGGTCGVLAWLCTL GTIKIADYPKAVWQGAKSM FGAIAILILAWLISTW GEMHTGDYL 
310 320 330 340 350 360 

180 190 200 210 220 230 

STLVAGNIHP GFLPVILFLLASVMAFA TGTSW GTFGIMLPIAAAMAVKV EP ALI IPCMSA 

STLVAGNIHP GFLXVILFLLASVMAFA TGTSW GTFGIMLPIAAAMAVKV DP SLIIPCMSA 
370 380 390 400 410 420 

240 250 260 270 280 290 

VMAGAVCG DHCSPISDTTILSSTGARCNHIDHVTSQLPY ALTVAAAAASGYLALGL TKSA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I! I I I I I 
VMAGAVCG DHC SPISDTT I LS S TGARCNHI DKVT SQLPY ALTVAAAAASGYLALGL TKSA 
430 440 450 460 470 430 

300 310 
LLGFGTTGIVLAVLIFL LKDKK 

: II 

LLG FGXTGIVLAVLIFL LKDKKRANAX 
490 500 

The complete length ORF26a nucleotide sequence <SEQ ID 693> is: 

1 ATGCAGCTGA TCGACTATTC ACATTCATTT TTCTCGGTTG TGCCACCCTT 

51 TTTGGCACTG GCACTTGCCG TCATTACCCG CCGCGTACTG CTGTCTTTAG 

101 GCATCGGTAT TCTGGTCGGC GTTGCCTTTT TGGTCGGCGG CAACCCCGTC 

151 GACGGTCTGA CACACCTGAA AGACATGGTC GTCGGCTTGG CTTGGTCAGA 

201 CGGCGATTGG TCGCTGGGCA AACCAAAANT CTTGGTTTTC CTGATACTTT 

251 TGGGTATTTT TACTTCCCTG CTGACCTACT CCGGCAGCAA TCAGGCGTTT 

301 GCCGACTGGG CAAAACGGCA CATTAAAAAC CGGCGCGGCG CGAAAATGCT 

351 GACCGCCTGC CTCGTGTTCG TAACCTTTAT CGACGACTAT TTCCACAGTC 

4 01 TCGCCGTCGG TGCGNTTGCC CGCCCCGTTA CCGACAAGTT TAAAGTTTCC 

4 51 CGCGCCAAAC TCGCCTACAT CCTCGACTCC ACTGCCGCGC CTATGTGCGT 

501 GCTGATGCCC GTTTCAAGCT GGGGCGCGTC GATTATCGCC ACGCTTGCCG 

551 GACTGCTCGT TACCTACAAA ATCACCGAAT ACACGCCGAT GGGGACGTTT 

601 GTCGCCATGA GCCTGATGAA CTATTACGCA CTGTTTGCCC TGATTATGGT 

651 GTTCGTCGTC GCATGGTTCT CCTTCGACAT CGGCTCGATG GCACGTTTCG 

7 01 AACAAGCCGC GTTGAACGAA GCCCACGATG AAACTGCCGT TTCAGACGGC 

7 51 AGCTGGGGCA GGGTTTACGC ATTGATTATT CCCGTTTTGG CCTTAATCGC 
801 CTCAACGGTT TCCGCCATGA TCTACACCGG TGCACAGGCA AGCGAAACCT 

8 51 TCAGCATTTT GGGTGCATTT GAAAATACGG ACGTGAACAC TTCGCTGGTA 
901 TTCGGCGGCA CTTGCGGCGT GCTTGCCGTC GTCCTCTGCA CGCTCGGCAC 
951 GATTAAAATC GCCGATTATC CCAAAGCCGT TTGGCAGGGT GCGAAATCCA 

1001 TGTTCGGCGC AATCGCCATT TTAATCCTTG CCTGGCTCAT CAGTACGGTT 

1051 GTCGGCGAAA TGCACACAGG CGACTACCTC TCCACGCTGG TTGCGGGCAA 

1101 CATCCATCCC GGCTTCCTGN CCGTCATCCT TTTCCTGCTC GCCAGCGTGA 

1151 TGGCGTTTGC CACAGGCACA AGCTGGGGGA CGTTCGGCAT CATGCTGCCG 

1201 ATTGCCGCCG CCATGGCGGT CAAAGTCGAT CCCTCACTGA TTATCCCGTG 

1251 TATGTCCGCC GTGATGGCGG GGGCGGTATG CGGCGACCAC TGCTCGCCCA 

1301 TTTCCGACAC GACCATCCTG TCGTCCACCG GCGCGCGCTG CAACCACATC 



orf26.pep 
orf26a 

orf26.pep 
orf26a 

orf26.pep 
orf26a 

orf26.pep 
orf26a 
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1351 GACCACGTTA CNTCGCAACT GCCTTACGCC TTAACCGTTG CCGCCGCCGC 

14 01 CGCATCGGGN TACCTCGCAT TGGGTCTGAC AAAATCCGCG CTGTTGGGTT 

1451 TTGGCANGAC AGGCATTGTA TTGGCGGTGC TGATTTTTCT GTTGAAAGAT 

1501 AAAAAACGCG CCAACGCCTG A 

This encodes a protein having amino acid sequence <SEQ ID 694>: 

1 MQLIDYSHSF FSWPPFLAL A LAVITR RVL LSLGIGILVG VAFLV GGNPV 

51 DGLTHLKDMV VGLAWSDGDW SLGKPK XLVF LILLGIFTSL LTY SGSNQAF 

101 ADWAKRHIKN R RGAKMLTAC LVFVTFID DY FHSLAVGAXA RPVTDKFKVS 

151 RAKLAYILDS TAAPMCVLMP VSSWGASIIA TLAGLLV TYK ITEYTPMGTF 

201 VAMSLMNYYA LFALIMVFW AWFSFDI GSM ARFEQAALNE AHDETAVSDG 

251 SWGRVYA LII PVLALIASTV SAMI YTGAQA SETFSILGAF ENTDVNTSLV 

301 FGGTCGVLAV VLCTL GTIKI ADYPKAVWQG AKSM FGAIAI LILAWLISTV 

351 VGEMHTGDYL STLVAGNIHP GFLXVILFLL ASVMAFA TGT SW GTFGIMLP 

401 IAAAMAVKV D P SLI IPCM3A VMAGAVCG DK CSPISDTTIL SSTGARCNHI 

451 DHVTSQLPY A LTVAAAAASG YLALGL TKSA LLGFGXTGIV LAVLIFL LKD 

501 KKRANA* 

ORF26a and ORF26-1 show 97.8% identity in 506 aa overlap: 

10 20 30 40 50 60 

orf2 6a.pep MQLIDYSHSFFSWPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 

I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I Ill 

orf2 6-l MOLIDYSHSFFSVVPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf2 6a.pep VGLAWSDGDWSLGKPKXLVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRRGAKMLTAC 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 
orf2 6-l VGLAWSDGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRRGAKMLTAC 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf26a.pep LVFVT FI DDYFHSLAVGAXARPVTDKFKVSRAKLAYILDSTAAPMCVLMPVS SWGAS I IA 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I: I I II I I I I I I II I I I I I I I I I I I I I I I 
orf26-l LVFVTFIDDYFHSLAVGAIARPVTDKFKVSRTKLAYILDSTAAPMCVLMPV5SWGASIIA 

130 140 150 160 170 180 

190 200 210 220 230 240 

or f 2 6a. pep TLAGLLVTYKITEYTPMGTFVAMSLMNYYALFALIMVFWAWFSFDIGSMARFEQAALNE 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I i 
or f 26-1 TLAGLLVTYKITEYTPMGTFVAMSLMNYYALFALIMVFVVAWFSFDIGSMARFEQAALNE 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf26a.pep AHDETAVSDGSWGRVYALIIPVLALIASTVSAMIYTGAQASETFSILGAFENTDVNTSLV 
I I I I I I I I I : : I I I I I I II I I I I I I I I I II I II I I I I I I I II I I I I I I I I I I I I I I I I I 
orf26-l AHDETAVSDATKGRVYALIIPVLALIASTVSAMIYTGAQASETFSILGAFENTDVNTSLV 

250 260 270 280 290 300 

310 320 330 340 350 360 

Orf26a.pep FGGTCGVLAWLCTLGTIKIADYPKAVWQGAKSMFGAIAILILAWLISTWGEMHTGDYL 

orf2 6-l FGGTCGVLAWLCTLGTIKTADYPKAVWQGAKSMFGAIAILILAWLISTWGEMHTGDYL 
310 320 330 340 350 360 

370 380 390 400 410 420 

orf26a.pep STLVAGNIHPGFLXVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVDPSLIIPCMSA 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I : I : I I I I I I I I 

orf26-l STLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVEPALIIPCMSA 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf26a.pep VMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQLPYALTVAAAAASGYLALGLTKSA 
I I II I I I I I I II I II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I 
orf2 6-l VMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQLPYALTVAAAAASGYLALGLTKSA 

430 440 450 460 470 480 



orf26a.pep 



490 500 
LLGFGXTGIVLAVLIFLLKDKKRANAX 
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Homology with a predicted ORF from N. gonorrhoeae 

ORF26 shows 94.8% and 99% identity in 97 and 206 aa overlap at the N-terminus and C-terminus, 
respectively, with a predicted ORF (ORF26ng) from N. gonorrhoeae: 



or f 2 6. pep 
orf 26ng 
orf26 .pep 
orf26ng 



MQLIDYSHSFFSVVPPFLALALAVITRRVLLSLGIGILXXVAFLVGGNPVDGLTHLKDMV 



I ; 



I I I I I 



II I I I I I 



I I I 



MQLIDYSHSFFSVVPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 

VGLAWSDXDWSLGKPKILVFXILLGIFTSLLTYSGSN 
I I I I I : I I I I I I I I I I I I I II 

VGLAWADGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRCGAKMLTAC 
// 



orf 26. pep TSLVFGGTCGVFAWLCTLGTIKTADYPKA 326 

orf26ng ASTVSAMIYTGAQASETFSILGAFENTDVNTSLVFGGTCGVLAWLCTFGTIKTADYPKA 32 6 

orf 26. pep VWQGAKSMFGAIAILILAWLISTVVGEMHTGDYLSTLVAGNIHPGFLPVILFLLASVMAF 386 

I I I I I I II I I I I I I I I I I I I I I 

orf26ng VWQGAKSMFGAIAILILAWLISTWGEMHTGDYLSTLVAGNIHPGFLPVILFLLASVMAF 386 

orf 26. pep ATGTSWGTFGIMLPIAAAMAVKVEPALIIPCMSAVMAGAVCGDHCSPISDTTILSSTGAR 44 6 

I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I 

orf26ng ATGT5WGTFGIMLPIAAAMAVKVEPALIIPCMSAVMAGAVCGDHCSPISDTTILSSTGAR 44 6 

orf 26 . pep CNHIDHVTSQLPYALTVAAAAASGYLALGLTKSALLGFGTTGIVLAVLIFLLKDKK 502 

orf26ng CNHIDHVTSQLPYALTVAAAAASGYLALGLTKSALLGFGTTGIVLAVLIFLLKDKKRADV 506 

The complete length ORF26ng nucleotide sequence <SEQ ID 695> is: 



1001 
1051 
1101 
1151 
12C1 
1251 
1301 
1351 
1401 
1451 
1501 



ATGCAGCTGA 
TTTGGCACTG 
GCATCGGTAT 
GACGGTCTGA 
CGGCGATTGG 
TGGGCATTTT 
GCCGACTGGG 
GACCGCCTGC 
TCGCCGTCGG 
CGCGCCAAAC 
GCTGATGCCC 
GATTGCTCGT 
GTCGCCATGA 
ATTCGTCGTC 
AACAGGCTGC 
ACCAAAGGTC 
CTCAACGGTT 
TCAGCATTTT 
TTCGGCGGCA 
GATTAAAACC 
TGTTCGGCGC 
GTCGGCGAAA 
CATCCATCCC 
TGGCGTTTGC 
ATTGCCGCCG 
TATGTCCGCA 
TCTCCGACAC 
GACCACGTTA 
CGCATCGGGC 
TTGGCACGAC 
AAAAAACGCG 



TTGACTATTC 
GCACTTGCCG 
TTTGGTCGGC 
CACACCTGAA 
TCGCTGGGCA 
CACTTCACTG 
CAAAACGGCA 
CTCGTGTTCG 
TGCGATTGCC 
TCGCCTACAT 
GTTTCAAGCT 
TACCTACAAA 
GCCTGATGAA 
GCATGGTTCT 
GTTGAACGAA 
GTGTTTACGC 
TCCGCCATGA 
GGGGGCATTT 
CITGCGGCGT 
GCCGATTATC 
AATCGCCATT 
TGCACACGGG 
GGCTTCCTGC 
CACAGGCACA 
CCATGGCGGT 
GTAATGGCGG 
GACCATCCTG 
CCTCGCAACT 
TACCTCGCAT 
CGGTATTGTA 
CCGACGTTTG 



ACATTCATTT 
TCATTACCCG 
GTTGCCTTTT 
AGACATGGTC 
AACCAAAAAT 
CTGACCTACT 
CATTAAAAAC 
TAACCTTTAT 
CGCCCCGTTA 
CCTCGACTCC 
GGGGCGCGTC 
ATTACCGAAT 
CTATTACGCG 
CCTTCGACAT 
gcccaggacg 
ATTGATTATT 
TCTACACCGG 
GAAAATACCG 
GCTTGCCGTC 
CCAAAGCCGT 
TTAATCCTCG 
CGACTACCTC 
CCGTCATCCT 
AGCTGGGGGA 
CAAAGTCGAA 
GGGCGGTATG 
TCGTCCACCG 
GCCTTATGCC 
TGGGTCTGAC 
TTGGCGGTGC 



TTCTCGGTTG 
CCGCGTACTG 
TGGTCGGCGG 
GTCGGCTTGG 
CTTGGTTTTC 
CCGGCAGCAA 
CGGTGCGGCG 
CGACGACTAT 
CCGACAAGTT 
ACTGCCTCGC 
GATTATCGCC 
ACACGCCGAT 
CTGTTTGCCC 
CGGCTCGAtg 
aaaccgccgc 
CCCGTTTTGG 
CGCGCAGGCA 
ACGTAAACAC 
GTCCTCTGCA 
GTGGCAGGGT 
CCTGGCTCAT 
TCCACGCTGG 
CTTCCTGCTC 
CGTTCGGCAT 
CCCGCGCTGA 
CGGCGACCAC 
GCGCGCGCTG 
CTGACGGTTG 
AAAATCCGCG 
TGATTTTTCT 



TGCCACCCTT 
CTGTCTTTAG 
CAACCCCGTC 
CTTGGGCAGA 
CTGATACTTT 
TCAGGCGTTT 
CGAAAATGCT 
TTCCACAGCC 
TAAAGTTTCC 
CCATGTGCGT 
ACGCTTGCCG 
GGGGACGTTT 
TGATTATGGT 
gCGCGTTTCG 
tTCAGACgCT 
CCTTAATCGC 
AGCGAAACCT 
TTCGCTGGTA 
CGTTCGGCAC 
GCGAAATCCA 
CAGTACGGTT 
TTGCGGGCAA 
GCCAGCGTGA 
TATGCTGCCG 
TTAtcccGTG 
TGTTCGCCCA 
CAACCACATC 
CCGCCGCCGC 
CTGTTGGGCT 
GTTGAAAGAT 



This encodes a protein having amino acid sequence <SEQ ID 696>: 
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1 MQLIDYSHSF FSWPPFLAL A LAVITRR VL LSLGIGILVG VAFLV GGNPV 

51 DGLTHLKDMV VGLAWADGDW S1GKPK ILVF LILLGIFTSL LTY SGSNQAF 

101 ADWAKRHIKN R CGAKMLTAC LVFVTFID DY FHSLAVGAIA RPVTDKFKVS 

151 RAKLAYILDS TASPMCVLMP VSSWGASIIA TLAGLLV TYK ITEYTPMGTF 

201 VAMSLMNYYA LFALIMVFW AWFSFDI GSM ARFEQAALNE AQDETAASDA 

251 TKGRVYA LII PVLALIASTV SAMI YTGAQA SETFSILGAF ENTDVNTSLV 

301 FGGTCGVLAV VLCTF GTIKT ADYPKAVWQG AKSM FGAIAI LILAWLISTV 

351 VGEMHTGDYL STLVAGNIHP GFLPVILFLL ASVMAFA TGT SW GTFGIMLP 

401 IAAAMAVKV E P ALIIPCMSA VMAGAVCG DH CSPISDTTIL S5TGARCNHI 

451 DHVTSQLPY A LTVAAAAASG YLALGL TKSA LLGFGTTGIV LAVLIFL LKD 

501 KKRADV* 

ORF26ng and ORF26-1 show 98.4% identity in 505 aa overlap: 

10 20 30 40 50 60 

orf26-l.pep MQLIDYSHSFFSVVPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 

orf2 6ng MQLIDYSHSFFSWPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 
10 20 30 40 50 60 



70 80 90 100 110 120 

VGLAWSDGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRRGAKMLTAC 

VGLAWADGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRCGAKMLTAC 
70 80 90 100 110 120 

130 140 150 160 170 180 

LVFVTFIDDYFHSLAVGAIARPVTDKFKVSRTKLAYILDSTAAPMCVLMPVSSWGASIIA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I 1 I I I I I : I I I I I I I I I I I I I I I I I 
LVFVTFIDDYFHSLAVGAIARPVTDKFKVSRAKLAYILDSTASPMCVLMPVSSWGASIIA 

130 140 150 160 170 180 

190 200 210 220 230 240 

TLAGLLVTYKITEYTPMGTFVAMSLMNYYALFALIMVFVVAWFSFDIGSMARFEQAALNE 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
TLAGLLVTYKITEYTPMGTFVAMSLMNYYALFALIMVFVVAWFSFDIGSMARFEQAALNE 

190 200 210 220 230 240 



250 260 270 280 290 300 

AHDETAVSDATKGRVYALIIPVLALIASTVSAMIYTGAQASETFSILGAFENTDVNTSLV 
I : I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
AQDETAASDATKGRVYALIIPVLALIASTVSAMIYTGAQASETFSILGAFENTDVNTSLV 

250 260 270 280 290 300 

310 320 330 340 350 360 

FGGTCGVLAWLCTLGTIKTADYPKAVWQGAKSMFGAIAILILAWLISTWGEMHTGDYL 
I I I I I I I I I I I I I I : I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
FGGTCGVLAVVLCTFGTIKTADYPKAVWQGAKSMFGAIAILILAWLISTWGEMHTGDYL 

310 320 330 340 350 360 

370 380 390 400 410 420 

STLVAGNIHPGFLPVILFLLASVKAFATGTSWGTFGIMLPIAAAMAVKVEPALIIPCMSA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I ! I I I I I I I I I I I I I I I I I 
STLVAGNIHPGFLPVILFLLASVKAFATGTSWGTFGIMLPIAAAMAVKVEPALIIPCMSA 

370 380 390 400 410 420 

430 440 450 460 470 480 

VMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQLPYALTVAAAAASGYLALGLTKSA 

VMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQLPYALTVAAAAASGYLALGLTKSA 
430 440 450 460 470 480 



490 500 
orf 26-1 . pep LLGFGTTGIVLAVLIFLLKDKKRANAX 

I I I I I I I I I I I I I I I I I I I I I I I I : : 
or f 2 6ng LLGFGTTGIVLAVLI FLLKDKKRADVX 

490 500 

In addition, ORF26 ng shows significant homology to a hypothetical H.influenzae protein: 



orf 26-1. pep 
orf2 6ng 



orf2 6-l.pep 
orf26ng 



orf26-l .pep 
orf2 6ng 



orf 2 6-1. pep 
orf 26ng 



orf 2 6-1 .pep 
orf2 6ng 



orf 26-1 .pep 
orf 26ng 



orf 2 6-1 .pep 
orf 2 6ng 
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sp I P442 63 I YF8 6_HAEIN HYPOTHETICAL PROTEIN HI1586 >gi I 1074850 | pir I I C64037 
hypothetical 

protein HI1586 - Haemophilus influenzae (strain Rd KW20) >gi 11574427 (U32832) H. 
influenzae predicted coding region HI1586 [Haemophilus influenzae] Length = 519 
5 Score = 538 bits (1370), Expect = e-152 

Identities = 280/507 (55%), Positives = 346/507 (68%), Gaps = 7/507 (1%) 





Query: 


1 


MQLIDYSHSFFSWPPFLALALAVITRRXXXXXXXXXXXXXAFLVGGNPVDGLTHLKDMV 


60 








M+LID+S S +S+VP LA+ LA+ TRR L +L V 




10 


Sbjct: 


14 


MELIDFSSSVWSIVPALLAI ILAIATRRVLVSLSAGI IIGSLMLSDWQIGSAFNYLVKNV 


73 




Query : 


61 


VGLAWADGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRCGAKMLTAC 


120 






V L +ADG+ + I++FL+LLG+ T+LLT SGSN+AFA+WA+ IK R GAK+L A 




15 


Sbjct: 


74 


VSLVYADGEIN-SNMNIVLFLLLLGVLTALLTVSGSNRAFAEWAQSRIKGRRGAKLLAAS 


132 




Query: 


121 


LVFVTFIDDYFHSLAVGAIARPVTDKFKVSRAKLAYILDSTASPMCVLMPVSSWGASIIA 


180 






LVFVTFIDDYFHSLAVGAIARPVTD+FKVSRAKLAYILDSTA+PMCV+MPVSSWGA II 






Sbjct: 


133 


LVFVTFIDDYFHSLAVGAIARPVTDRFKVSRAKLAYILDSTAAPMCVMMPVSSWGAYIIT 


19 


20 


Query: 


181 


TLAGLLVTYKITEYTPMGTFVAMSLMNYYALFALIMVFWAWFSFDIGSMARFEQAALNE 


240 








+ GLL TY ITEYTP+G FVAMS MN+YA+F++IMVF VA+FSFDI SM R E+ AL 






Sbjct: 


193 


LIGGLLATYSITEYTPIGAFVAMSSMNFYAIFSI IMVFFVAYFSFDIASMVRHEKLALKN 


252 


25 


Query: 


241 


AQDETAASDATKGRVYALIIPVLALIASTVSAMIYTGAQA SETFSILGAFENTDVN 


296 




+D+ TKG+V LI+P+L LI +TVS MIYTGA+A + FS+LG FENT V 






Sbjct: 


253 


TEDQLEEETGTKGQVRNLILPILVLIIATVSMMIYTGAEALAADGKVFSVLGTFENTWG 


312 




Query: 


297 


TSLV FGGTCGVL — AWLCT FGT IKTADY PKAVWQGAKSMFGXXXXXXXXXXX ST WGEM 


354 






TSLV GG C ++ +++ + +Y ++ G KSM G + +VG+M 




30 


Sbjct: 


313 


TSLWGGFCSIIISTLLIILDRQVSVPEYVRSWIVGIKSMSGAIAILFFAWTINKIVGDM 


372 




Query: 


355 


HTGDYLSTLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVEPALI 


414 






TG YLS+LV+GNI FLPVILF+L + MAF+TGTSWGTFGIMLPIAAAMA P L+ 




35 


Sbjct: 


373 


QTGKYLSSLVSGNIPMQFLPVILFVLGAAMAFSTGTSWGTFGIMLPIAAAMAANAAPELL 


432 






415 


IPCMSAVMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQXXXXXXXXXXXXXXXXXX 


474 






+PC+SAVMAGAVCGDHCSP+SDTTILSSTGA+CNHIDHVT+Q 






Sbjct: 


433 


LPCLSAVMAGAVCGDHCSPVSDTTILSSTGAKCNHIDHVTTQLPYAATVATATSIGYIW 


4 92 


40 


Query: 


475 


XXXKS ALLG FGTTG IVLAVLI FLLKDK 501 










S L GF T + L V+IF +K + 






Sbjct: 


493 


GFTYSGLAGFAATAVSLIVIIFAVKKR 519 





Based on this analysis, it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
45 and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 83 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 697>: 

1 ..AAGCAATGGT ATGCCGACGN . AGTATCAAG ACGGAAATGG TTATGGTCAA 

51 CGATGAGCCT GCCAAAATTC TGACTTGGGA TGAAAGCGGC CGATTACTCT 

101 CGGAACTGTC TATCCGCCAC CATCAACGCA ACGGGGTGGT TTTGGAGTGG 

151 TATGAAGATG GTTCTAAAAA GAGCGAAGT. GTTTATCAGG ATGACAAGTT 

201 GGTCAGGAAA ACCCAGTGGG ATAAGGATGG TTATTTAATC GAACCCTGA 

This corresponds to the amino acid sequence <SEQ ID 698; ORF27>: 



Further work revealed the complete nucleotide sequence <SEQ ID 699>: 

1 ATGAAAAAAT TATCTCGGAT TGTATTTTCA ACTGTCCTGT TGGGTTTTTC 

51 GGCCGCTTTG CCGGCGCAGA CCTATTCTGT TTATTTTAAT CAGAACGGAA 

101 AGCTGACGGC GACGATGTCT TCTGCCGCTT ATATCAGGCA ATATAGTGTG 

151 GTGGCGGGTA TTGCGCACGC GCAGGATTTT TATTATCCGT CGAT GAAGAA 



WO 99/24578 



-401- 



PCT/IB98/01665 



201 ATATTCTGAA CCTTATATCG TTGCTTCAAC GCAAATCAAA TCTTTTGTGC 

251 CTACCCTGCA AAACGGTATG TTGATTTTGT GGCATTTTAA TGGTCAGAAA 

301 AAAATGGCGG GGGGCTTCAG CAAGGGTAAG CCGGACGGGG AGTGGGTCAA 

351 CTGGTATCCG AACGGTAAAA AATCTGCCGT TATGCCTTAT AAAAATGGCT 

401 TGAGTGAGGG TACGGGATAC CGCTATTACC GTAACGGCGG CAAGGAAAGC 

451 GAAATCCAGT TTAAGCAAAA TAAGGCAAAC GGCGTATGGA AGCAATGGTA 

501 TGCCGACGGC AGTATCAAGA CGGAAATGGT TATGGTCAAC GATGAGCCTG 

551 CCAAAATTCT GACTTGGGAT GAAAGCGGCC GATTACTCTC GGAACTGTCT 

601 ATCCGCCACC ATCAACGCAA CGGGGTGGTT TTGGAGTGGT ATGAAGATGG 

651 TTCTAAAAAG AGCGAAGCTG TTTATCAGGA TGACAAGTTG GTCAGGAAAA 

701 CCCAGTGGGA TAAGGATGGT TATTTAATCG AACCCTGA 

This corresponds to the amino acid sequence <SEQ ID 700; ORF27-l>: 

1 MKKLSRIVFS TVLLGFSAAL PAQTYSVYFN QNGKLTATMS SAAYIRQYSV 

51 VAGIAHAQDF YYPSMKKYS3 PYIVA3TQIK SFVPTLQNGM LILWHFNGQK 

101 KMAGGFSKGK PDGEWVNWYP NGKKSAVMPY KNGLSEGTGY RYYRNGGKES 

151 EIQFKQNKAN GVWA'QWYADG SIKTEMVMVN DEPAKILTWD ESGRLLSELS 

201 IRHHQRNGVV LEWYEDGSKK SEAVYQDDKL VRKTQWDKDG YLIEP* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted QRF from N .meningitidis (strain A) 

ORF27 shows 91.5% identity over a 82aa overlap with an ORF (ORF27a) from strain A of N. 
meningitidis: 

10 20 30 

orf27 .pep KQW YADX S I KTEMVMVN DEPAKI LTWDE S G 

1 I I I I I : I I I I M I I I I I I I I I I I I I I I I 

orf27a LSEGTGXRYYRNGGKESEIQFKQNKANGVWKQWYADGNIKTEMVMVNDEPAKILTWDESG 
140 150 160 170 180 190 

40 50 60 70 80 

orf27 .pep RLLSELSIRHHQRNGVVLEWYEDGSKKSEXVYQDDKLVRKTQWDKDGYLIEPX 



The complete length ORF27a nucleotide sequence <SEQ ID 701> is: 

1 ATGAAAAAAT TATCTCGGAT TGTATTTTCA ACTGTCCTGT TGGGTTTTTC 

51 GGCCGCTTTG CCGGCGCAGA NCTATTCTGT TTATTTTAAT CAGAACGGGA 

101 AACTGACGGC GACGNTGTCT TCTGCCGCNT ATATCAGGCA ATATAGTGTG 

151 GCGGAGGGTA TTGCGCACGC GCAGGANTTT TANTATCCGT CGATGAAGAA 

201 ATATTCCGAA CCTTATATCG TTGCTTCAAC GCAAATCAAA TCTTTTGTGC 

251 CTACCCTGCA AAACGGTATG TTGATTTTGT GGCATTTTAA NGGTCAGAAA 

301 AAAATGGCNG GGGGCTTCAG CAAGGGTAAG CCGGACGGGG AGTGGGTCAA 

351 CTGGTATCCG AACGGTAAAA AATCTGCCGT TATGCCTTAT AAAAATGGTT 

401 TGAGTGAAGG TACGGGGTNN CGCTATTACC GTAACGGCGG CAAGGAAAGC 

451 GAAATCCAGT TTAAACAGAA TAAGGCAAAC GGCGTATGGA AGCAATGGTA 

501 TGCCGACGGC AAT AT CAAAA CGGAAATGGT TATGGTCAAT GATGAGCCTG 

551 CCAAAATTCT GACATGGGAT GAAAGCGGTC GATTACTCTC GGAACTGTCT 

601 ATCCATCATC ATNAACGTAA TGGAGTAGTC TTAGAGTGGT ATGAAGATGG 

651 TTCTAAAAAG ANTGAAGCTG TTTATCAGGA TGATAAGTTG GTCAGGAAAA 

701 CCCAGTGGGA TAANGATGGT TATTTAATCG AACCCTGA 



This encodes a protein having amino acid sequence <SEQ ID 702>: 



1 MKKLSRIVFS TVLLGFSAAL PAQXYSVYFN QNGKLTATXS SAAYIRQYSV 

51 AEGIAHA QXF XYPSMKKYSE PYIVASTQIK SFVPTLQNGM LILWHFXGQK 

101 KMAGGFSKGK PDGEWVNWYP NGKKSAVMPY KNGLSEGTGX RYYRNGGKES 

151 EIQFKQNKAN GVWKQWYADG NIKTEMVMVN DEPAKILTWD ESGRLLSELS 

201 IHHHXRNGW LEWYEDGSKK XEAVYQDDKL VRKTQWDXDG YLIEP* 

ORF27a and ORF27-1 show 94.7% identity in 245 aa overlap: 

10 20 30 40 50 60 

orf27a.pep MKKLSRIVFSTVLLGFSAALPAQXYSVYFNQNGKLTATXSSAAYIRQYSVAEGIAHAQXF 
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I II II II I II I II I II I I I I II I : I I II I II II M II I I I I I I I I I I I I : I I I I I I ! 
orf27-l MKKLSRIVFSTVLLGFSAALPAQTYSVYFNQNGKLTATMSSAAYIRQYSWAGIAHAQDF 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf27a.pep XYPSMKKYSEPYIVASTQIKSFVPTLQNGMLILWHFXGQKKMAGGFSKGKPDGEWVNWYP 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I 

orf27-l YYPSMKKYSEPYIVASTQIKSFVP7LQNGMLILWHFNGQKKMAGGFSKGKPDGEWVNWYP 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf27a.pep NGKKSAVMP YKNGL SEGTGXRYYRNGGKESEI QFKQNKANGVWKQWYADGN I KTEMVMVN 

I I I I I I I I II: 

orf27-l NGKKSAVMPYKNGLSEGTGYRYYRNGGKESEIQFKQNKANGVWKQWYADGSIKTEMVMVN 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf27a.pep DEPAKILTWDESGRLLSELSIHHHXRNGVVLEWYEDGSKKXEAVYQDDKLVRKTQWDXDG 
I I I I I I I II I I I I I I I I I I I I : I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II 
orf27-l DEPAKILTWDESGR1LSELEIRHHQRNGWLEWYEDGSKKSEAVYQDDKLVRKTQWDKDG 

190 200 210 220 230 240 



orf27a.pep YLIEPX 
orf27-l YLIEPX 



Homology with a predicted QRF from A '.gonorrhoeae 

ORF27 shows 96.3% identity over 82 aa overlap with a predicted ORF (ORF27ng) from 
N. gonorrhoeae: 

orf27.pep KQWYADXSIKTEMVMVNDEPAKILTWDESG 30 

orf27ng LSEGTGYRYYRNGGKESEIQFKQNKANGVWKQWYADGSIKTEMVMVNDEPAKILTWDESG 193 

orf27 .pep RLLSELSIRHHQRNGVVLEWYEDGSKK3EXVYQDDKLVRKTQWDKDGYLIEP 82 
orf27ng RLLSELSIRHHKRNGVVLEWYEDGSKKSEAVYQDDKLVRKTQWDKDGYLIEP 245 

The complete length ORF27ng nucleotide sequence <SEQ ID 703> is: 



1 ATGAAGAAAT TATCTCGGAT TGTATTTTCA ATCGTACTGT TGGGTTTTTC 

51 GGCCGCTTTG CCGGCGCAGA CCTATTCTGT TTATTTTAAT CAGAACGGGA 

101 AACTGACGGC GACGATGTCT TCTGCCGCTT ATATCAGGCA ATATAGTGTG 

151 GCGGCGGGTA TCGCACACGC GCAGGATTTT TATTATCCGT CGATGAAGAA 

201 ATATTCCGAA CCTTATATCG TTGCTTCAAC GCAAATCAAA TCTTTTGTGC 

251 CTACCCTGCA AAACGGTATG TTGATTTTGT GGCATTTTAA TGGTCAGAAA 

301 AAAATGGCGG GGGGCTTCAG CAAGGGTAAG CCGGACGGGG AATGGGTCAA 

351 CTGGTATCCG AACGGTAAAA AATCTGCGGT TATGCCTTAT AAAAATGGCT 

401 TGAGTGAGGG TACGGGATAC CGTTATTACC GTAACGGCGG CAAGGAAAGC 

451 GAAATCCAGT TTAAGCAAAA TAAGGCGAAC GGCGTATGGA AGCAATGGTA 

501 TGCCGATGGA AGTATCAAGA CGGAAATGGT TATGGTCAAC GATGAGCCTG 

551 CCAAAATTCT GACTTGGGAT GAAAGCGGCC GATTACTTTC GGAACTGTCT 

601 ATCCGCCACC ATAAACGCAA CGGGGTGGTT TTGGAGTGGT ATGAAGATGG 

651 TTCTAAAAAG AGCGAGGCTG TTTATCAGGA TGACAAGTTG GTCAGGAAAA 

701 CCCAATGGGA TAAGGATGGT TATTTAATCG AACCCTGA 

This encodes a protein having amino acid sequence <SEQ ID 704>: 

1 MKKLSRIVFS IVLLGFSAAL PA QTYSVYFN QNGKLTATMS SAAYIRQYSV 

51 AAGIAHAQDF YYPSMKKYSE PYIVASTQIK SFVPTLQNGM LILWHFNGQK 

101 KMAGGFSKGK PDGEWVNWYP NGKKSAVMPY KNGLSEGTGY RYYRNGGKES 

151 EIQFKQNKAN GVWKQWYADG SIKTEMVMVN DEPAKILTWD ESGRLLSELS 

201 IRHHKRNGW LEWYEDGSKK SEAVYQDDKL VRKTQWDKDG YLIEP* 

ORF27ng and ORF27-1 show 98.8% identity in 245 aa overlap: 

10 20 30 40 50 60 

orf 27-1. pep MKKLSRIVFSTVLLGFSAALPAQTYSVYFNQNGKLTATMSSAAYIRQYSWAGIAHAQDF 
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MKKLSRIVFSIVLLGFSAALPAQTYSVYFNQNGKLTATMS SAAYIRQYSVAAGIAHAQDF 



YYPSMKKYSEPYIVASTQIKSFVPTLQNGMLILWHFNGQKKMAGGFSKGKPDGEWVNWYP 



130 140 150 160 170 180 

NGKKSAVMPYKNGLSEGTGYRYYRNGGKESEIQFKQNKANGVWKQWYADGSIKTEMVMVN 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
NGKKSAVMPYKNGLSEGTGYRYYRNGGKESEIQFKQNKANGVWKQWYADGSIKTEMVMVN 

130 140 150 160 170 180 



orf 27-1. pep YLIEPX 

25 

orf27ng YLIEPX 

Based on this analysis, including the putative leader sequence in the gonococcal protein, it was 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
30 useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF27-1 (24.5kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
17A shows the results of affinity purification of the GST-fusion protein, and Figure 17B shows the 
results of expression of the His-fusion in E.coli. Purified GST-fusion protein was used to immunise 
35 mice, whose sera were used for ELISA, which gave a positive result, confirming that ORF27-1 is 
a surface-exposed protein and a useful immunogen. 

Example 84 

The following partial DNA sequence was identified mN. meningitidis <SEQ ID 705>: 

1 ATGAAATTTA CCAAGCACCC CGTCTGGGCA ATGGCGTTCC GCCCATTTTA 

40 51 TTCGCTGGCG GCTCTGTACG GCGCATTGTC CGTATTGCTG TGGGGTTTCG 

101 GCTACACGGG AACGCACkAG CTGTCCGGTT TCTATTGGCA CGCGCATGAg 

151 ATGATTTGGG GTTATGCCGG ACTGGTCGTC ATCGCCTTCC TGCTGACCGC 

201 CGTCGCCACT TGGACGGGGC AGCCGCCCAC GCGGGGCGGC GTaTCTGGTC 

251 GGCTTGACTA TCTTTTGGCT GGCTGCGCGG ATTGCCGCCT TTATCCCGGG 

45 301 TTGGGGTGCG TCGGCAAGCG GCATACTCGG TACGCTGTTT TTCTGGTACG 

351 GCGCGGTGTG CATGGCTTTG CCCGTTATCC GTTCGCAGAA TCAACGCAAC 

401 TATGTTgCCG TGTTCGCGCT GTTCGTCTTG GGCGGCACGC ATGCGGCGTT 

4 51 CCACGTCCAG CTGCACAACG GCAACCTAGG CGGACTCTTG AGCGGATTGC 

501 AGTCGGGCTT GGTGATG 

50 This corresponds to the amino acid sequence <SEQ ID 706; ORF47>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHX LSGFYWHAHE 
51 MIWGYAGLW IAFLLTAVAT WTGQPPTRGG VLVGLTIFWL AARIAAFIPG 
101 WGASASGILG TLFFWYGAVC MALPVIRSQN QRNYVAVFAL FVLGGTHAAF 
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151 HVQLHNGNLG GLLSGLQSGL VM 

Further work revealed the complete nucleotide sequence <SEQ ID 707>: 

1 ATGAAATTTA CCAAGCACCC CGTCTGGGCA ATGGCGTTCC GCCCATTTTA 

51 TTCGCTGGCG GCTCTGTACG GCGCATTGTC CGTATTGCTG TGGGGTTTCG 

101 GCTACACGGG AACGCACGAG CTGTCCGGTT TCTATTGGCA CGCGCATGAG 

151 ATGATTTGGG GTTATGCCGG ACTGGTCGTC ATCGCCTTCC TGCTGACCGC 

201 CGTCGCCACT TGGACGGGGC AGCCGCCCAC GCGGGGCGGC GTTCTGGTCG 

251 GCTTGACTAT CTTTTGGCTG GCTGCGCGGA TTGCCGCCTT TATCCCGGGT 

301 TGGGGTGCGT CGGCAAGCGG CATACTCGGT ACGCTGTTTT TCTGGTACGG 

351 CGCGGTGTGC ATGGCTTTGC CCGTTATCCG TTCGCAGAAT CAACGCAACT 

401 ATGTTGCCGT GTTCGCGCTG TTCGTCTTGG GCGGCACGCA TGCGGCGTTC 

4 51 CACGTCCAGC TGCACAACGG CAACCTAGGC GGACTCTTGA GCGGATTGCA 

501 GTCGGGCTTG GTGATGGTGT CGGGTTTTAT CGGTCTGATT GGTACGCGGA 

551 TTATTTCGTT TTTTACGTCC AAACGCTTGA ATGTGCCGCA GATTCCCAGT 

601 CCGAAATGGG TGGCGCAGGC TTCGCTGTGG CTGCCCATGC TGACTGCCAT 

651 GCTGATGGCG CACGGTGTGT TGGCTTGGCT GTCTGCCGTT TTTGCCTTTG 

701 CGGCAGGTGT GATTTTTACC GTGCAGGTGT ACCGCTGGTG GTATAAACCC 

751 GTGTTGAAAG AGCCGATGCT GTGGATTCTG TTTGCCGGCT ATCTGTTTAC 

801 CGGATTGGGG CTGATTGCGG TCGGCGCGTC TTATTTCAAA CCCGCTTTCC 

851 TCAATCTGGG TGTGCATCTG ATCGGGGTCG GCGGTATCGG CGTGCTGACT 

901 TTGGGCATGA TGGCGCGTAC CGCGCTTGGT CATACGGGCA ATCCGATTTA 

951 TCCGCCGCCC AAAGCCGTTC CCGTTGCGTT TTGGCTGATG ATGGCGGCAA 

1001 CCGCCGTCCG TATGGTTGCC GTATTTTCTT CCGGCACTGC CTACACGCAC 

1051 AGCATCCGCA CCTCTTCGGT TTTGTTTGCA CTCGCGCTTT TGGTGTATGC 

1101 GTGGAAGTAT ATTCCTTGGC TGATTCGTCC GCGTTCGGAC GGCAGGCCCG 

1151 GTTGA 

This corresponds to the amino acid sequence <SEQ ID 708; ORF47-l>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 M IWGYAGLVV IAFLLTAV AT WTGQPPTRGG V LVGLTI FWL AARIAAFI PG 

101 WGASAS G1LG TLFFWYGAVC MAL PVIRSQN QRN YVAVFAL FVLGGTHAAF 

151 HVQLHNGNLG GLLSGLQSGL VMVSGFIGLI GTRII SFFTS KRLNVPQIPS 

201 PKW VAQASLW LPMLTAMLMA HGVLAW LSAV FAFAAGVIFT VQV YRWWYKP 

251 VLKEPMLW IL FAGYLFTGLG LIAVG ASYFK PA FLNI.GVHL TGVGGTGVL T 

301 LGMMARTALG HTGNPIYPPP KAVP VAFWLM MAATAVRMVA V FSSGTAYTH 

351 SIRTSSVLFA LALLVYA WKY IPWLIRPRSD GRPG* 

Computer analysis of this amino acid sequence predicts a leader peptide and also gave the 
following results: 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF47 shows 99.4% identity over a 172aa overlap with an ORF (ORF47a) from strain A of TV". 
meningitidis: 

10 20 30 40 50 60 

orf 47 .pep MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHXLSGFYWHAHEM IWGYAGLW 
I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 47a MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEM IWGYAGLW 



70 80 90 100 110 120 

or f 4 7 . pep IAFLLTAV ATWTGQPPTRGGV LVGLTIFWLAARIAAFI PGWGASAS GILGTLFFWYGAVC 

orf47a IAFLLTAV ATWTGQPPTRGGV LVGLTI FWLAARIAAFI PGWGASAS GILGTLFFWYGAVC 

70 80 90 100 110 120 



MALPVIRSQNQRN YVAVFALFVLGGTHAAF HVQLHNGNLGGLLSGLQS GLVMVSGFIGLI 



orf47; 



GTRII SFFTSKRLNVPQIPSPKWVAQASLWLPMLTAMLMAHGVMPWLSAAFAFAAGVIFT 
190 200 210 220 230 240 
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The complete length ORF47a nucleotide sequence <SEQ ID 709> is: 

1 ATGAAATTTA CCAAGCACCC CGTTTGGGCA ATGGCGTTCC GCCCGTTTTA 

51 TTCACTGGCG GCTCTGTACG GCGCATTGTC CGTATTGCTG TGGGGTTTCG 

101 GCTACACGGG AACGCACGAG CTGTCCGGTT TCTATTGGCA CGCGCATGAG 

151 ATGATTTGGG GTTATGCCGG ACTGGTCGTC ATCGCCTTCC TGCTGACCGC 

201 CGTCGCCACT TGGACGGGGC AGCCGCCCAC GCGGGGCGGC GTTCTGGTCG 

251 GCTTGACTAT CTTTTGGCTG GCTGCGCGGA TTGCCGCCTT TATCCCGGGT 

301 TGGGGTGCGT CGGCAAGCGG CATACTCGGT ACGCTGTTTT TCTGGTACGG 

351 CGCGGTGTGC ATGGCTTTGC CCGTTATCCG TTCGCAGAAT CAACGCAATT 

401 ATGTTGCCGT GTTCGCGCTG TTCGTCTTGG GCGGTACGCA CGCGGCGTTC 

4 51 CACGTCCAGC TGCACAACGG CAACCTAGGC GGACTCTTGA GCGGATTGCA 

501 GTCGGGCTTG GTGATGGTGT CGGGTTTTAT CGGTCTGATT GGTACGCGGA 

551 TTATTTCGTT TTTTACGTCC AAACGGTTGA ATGTGCCGCA GATTCCCAGT 

601 CCGAAATGGG TGGCGCAGGC TTCGCTGTGG CTGCCCATGC TGACCGCCAT 

651 GCTGATGGCG CACGGCGTGA TGCCTTGGCT GTCGGCGGCT TTCGCGTTTG 

7 01 CGGCAGGTGT GATTTTTACC GTGCAGGTGT ACCGCTGGTG GTATAAGCCT 

7 51 GTGTTGAAAG AGCCGATGCT GTGGATTCTG TTTGCCGGCT ATCTGTTTAC 

801 CGGATTGGGG CTGATTGCGG TCGGCGCGTC TTATTTCAAA CCCGCTTTCC 

851 TCAATCTGGG TGTGCATCTG ATCGGGGTCG GCGGTATCGG CGTGCTGACT 

901 TTGGGCATGA TGGCGCGTAC CGCGCTCGGT CATACGGGCA ATCCGATTTA 

951 TCCGCCGCCC AAAGCCGTTC CCGTTGCGTT TTGGCTGATG ATGGCGGCAA 

1001 CCGCCGTCCG TATGGTTGCC GTATTTTCTT CCGGCACTGC CTACACGCAC 

1051 AGCATACGCA CCTCTTCGGT TTTGTTTGCA CTCGCGCTTT TGGTGTATGC 

1101 GTGGAAGTAT ATTCCTTGGC TGATTCGTCC GCGTTCGGAC GGCAGGCCCG 

1151 GTTGA 

This encodes a protein having amino acid sequence <SEQ ID 710>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 M IWGYAGLVV IAFLLTAV AT WTGQPPTRGG V LVGLTIFWL AARIAAFI PG 

101 WGASAS GILG TLFFWYGAVC MAL PV1RSQN QRN YVAVFAL FVLGGTHAAF 

151 HVQLHNGNLG GLLSGLQS GL VMVSGFIGLI GTRII SFFTS KRLNVPQIPS 

2 01 PKW VAQASLW LPMLTAMLMA HGVMPW LSAA FAFAAGVI FT VQV YRWWYKP 

251 VLKEPMLW IL FAGYLFTGLG LIAVG AS YFK PA FLNLGVHL IGVGGIGVL T 

301 LGMMARTALG HTGNPIYPPP KAVP VAFWLM KAATAVRMVA V FSSGTAYTH 

351 SIRTSSVLFA LALLVYA WKY IPWLIRPRSD GRPG* 

ORF47a and ORF47-1 show 99.2% identity in 384 aa overlap: 

10 20 30 40 50 60 

MKFTKHPWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 

I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 

MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 
10 20 30 40 50 60 

70 80 90 100 110 120 

IAFLLTAVATWTGQPPTRGGVLVGLTIFWLAARIAAFIPGWGASASGILGTLFFWYGAVC 

IAFLLTAVATWTGQPPTRGGVLVGLTIFWLAARIAAFIPGWGASASGILGTLFFWYGAVC 
70 80 90 100 110 120 

130 140 150 160 170 180 

MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 

I I I I I I I I I I II I I I I I I II I I I I I I 

MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 
130 140 150 160 170 180 

190 200 210 220 230 240 

GTRIISFFTSKRLNVPQIPSPKWVAQASLWLPMLTAMLMAHGVMPWLSAAFAFAAGVIFT 

I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I : I I I I : I I I I I I I I I I 

GTRI I S FFT SKRLNVPQI P S PKWVAQASLWLPMLTAMLMAHGVLAWLSAVFAFAAGVI FT 
190 200 210 220 230 240 

250 260 270 280 290 300 

VQVYRWWYKPVLKEPMLWILFAGYLFTGLGLIAVGASYFKPAFLNLGVHLIGVGGIGVLT 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 

VQVYRWWYKPVLKE PMLWILFAGYLFTGLGLIAVGASYFKPAFLNLGVHLIGVGGIGVLT 
250 260 270 280 290 300 



orf 47a . pep 
orf47-l 

orf47a.pep 
orf47-l 

orf 47a. pep 
orf47-l 

orf47a.pep 
orf47-l 

orf47a.pep 
orf47-l 



310 320 330 340 350 360 
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orf47a.pep LGMMARTALGHTGN PI YPPPKAVPVAFWLMMAAT AVRMVAVFS S GT AYTH S I RT S S VLFA 

III I Ill I I I I I I I I I I I I I I I I I I I I 

orf47-l LGMMARTALGHTGNPIYPPPKAVPVAFWLMMAATAVRMVAVF5SGTAYTH5IRTS3VLFA 
310 320 330 340 350 360 

370 380 
orf 4 7a .pep LALLVYAWKYI PWL IRPRSDGRPGX 

II Ill I 

orf 47-1 LALLVYAWKYI PWLIRPRSDGRPGX 

370 380 



Homology with a predicted ORF from N. gonorrhoeae 

ORF47 shows 97.1% identity over 172 aa overlap with a predicted ORF (ORF47ng) from 
N. gonorrhoeae: 

ORF47 MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
ORF4 7ng MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 60 

ORF47 IAFLLTAVATWTGQPPTRGGVLVGLTIFWLAARIAAFIPGWGASASGILGTLFFWYGAVC 12 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I 
ORF4 7ng IAFLLTAVATWTGQPPTRGGVLVGLTAFWLAARIAAFIPGWGAAASGILGTLFFWYGAVC 120 



ORF47 
ORF47ng 



MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVM 
MALPVIRSQNRRNYVAVFAIFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVWGFIGLI 



The ORF47ng nucleotide sequence <SEQ ID 71 1> is predicted to encode a protein comprising 
amino acid sequence <SEQ ID 712>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 M IWGYAGLW IAFLLTAVA T WTGQPPTRGG VLVGLTAFWL AARIAAFI PG 

101 WGAAAS GILG TLFFWYGAVC MAL PVIRSQN RR HYVAVFAI FVLGGTHAAF 

151 HVQLHNGNLG GL-LSGLQS GL VKVWGFIGLI GMKI I SFFTS KRLKLPQIPS 

201 PKWVAHASLW LPMLNAILMA HRVMPW L5AA FPFAAGVIFT VQV YAGGITP 

251 IEETSCGSVA GICYRLGNSS G 

The predicted leader peptide and transmembrane domains are identical (except for an He/ Ala 
substitution at residue 87 and an Leu/Ile substitution at position 140) to sequences in the 
meningococcal protein (see also Pseudomonas stutzeri orf396, accession number e246540): 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



-5.63 
Likelihood = -3.88 
Likelihood = -3.08 
Likelihood = -1.91 
Likelihood = -1.44 
Likelihood = -1.38 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



Further work revealed the complete gonococcal DNA sequence <SEQ ID 713>: 



1 ATGAAATTTA CCAAACATCC CGTCTGGGCA ATGGCGTTCC GCCCGTTTTA 

51 TTCACTGGCG GCACTGTACG GCGCATTGTC CGTATTGCTG TGGGGTTTCG 

101 GCTACACGGG AACGCACGAG CTGTCCGGTT TCTATTGGCA CGCGCATGAG 

151 ATGATTTGGG GTTATGCCGG TCTCGTCGTC ATCGCCTTCC TGCTGACCGC 

2 01 CGTCGCCACT TGGACGGGAC AGCCGCCCAC GAGGGGCGGC GTTCTGGTCG 
251 GCTTGACCGC CTTTTGGCTG GCTGCGCGGA TTGCCGCCTT TATCCCGGGT 

3 01 TGGGGTGCGG CGGCAAGCGG CATACTCGGT ACGCTGTTTT TCTGGTACGG 
351 CGCGGTGTGC ATGGCTTTGC CCGTTATCCG TtcgCAAAAC CGGCGCAACT 

4 01 ATGtcgCCGT ATTCGCAATA TTTGTGCTGG GCGGTACGCA TGCGgcgTTC 
4 51 CACGtccAgc tGCACAACGG CAACCTAGGC GGACTCTTGA GCGGATTGCA 
501 GTCGGGCCTG GTTATGGTGT CGGGCTTTAT CGGCCTGATT GGGATGAGGA 
551 TTATTTCGTT TTTTACGTCC AAACGGTTGA ACGTGCCGCA GATTCCCAGT 
601 CCGAAATGGG TGGCGCAGGC TTCGCTGTGG CTACCCATGC TGACCGCCAT 
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651 ACTGATGGCG CACGGCGTGA TGCCTTGGCT GTCGGCGGCT TTCGCGTTTG 

701 CGGCGGGCGT GATTTTTACC GTACAGGTGT ACCGCTGGTG GTATAAACCC 

751 GTATTGAAAG AACCGATGCT GTGGATTCTG TTTGCCGGCT ATCTGTTTAC 

801 CGGATTGGGG CTGATTGCGG TCGGCGCGTC TTATTTCAAA CCTGCCTTCC 

851 TCAATCTGGG CGTACATCTG ATCGGGGTCG GCGGTATCGG CGTGCTGACT 

901 TTGGGCATGA TGGCGCGTAC CGCGCTCGGT CATACGGGCA ATTCGATTTA 

951 TCCGCCGCCC AAAGCCGTTC CCGTTGCGTT TTGGCTGATG ATGGCGGCAA 

1001 CCGCCGTCCG TATGGTTGCC GTATTTTCTT CCGGCACTGC CTACACGCAC 

1051 AGCATCCGCA CGTCTTCGGT TTTGTTTGCA CTCGCGCTGC TGGTGTATGC 

1101 GTGGAAATAC ATTCCGTGGC TGATCCGTCC GCGTTCGGAC GGCAGGCCCG 

1151 GTTGA 

This encodes a protein having amino acid sequence <SEQ ID 714; ORF47ng-l>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 M IWGYAGLW IAFLLTAV AT WTGQPPTRGG V LVGLTAFWL AARIAAFI PG 

101 WGAAAS GILG TLFFWYGAVC MAL PVIRSQN RRN YVAVFAI FVLGGTHAAF 

151 HVQLHNGNLG GLLSGLQS GL VMVSGFIGLI GMRII SFFTS KRLNVPQIPS 

201 PKW VAQASLW LPMLTAILMA HGVMPW LSAA FAFAAGVIFT VQV YRWWYKP 

251 VLKEPMLW IL FAGYLFTGLG LIAVG ASYFK P AFLNLGVHL IGVGGIGVL T 

301 LGMMARTALG HTGNSIYPPP KAVP VAFWLM MAATAVRMVA V FSSGTAYTH 

351 SIRTSSVLFA LALLVYA WKY IPWLIRPRSD GRPG* 

ORF47ng-l and ORF47-1 show 97.4% identity in 384 aa overlap: 

10 20 30 40 50 60 

orf 47-1. pep MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 

orf47ng-l MKFTKHPVWA^FRPFYSIAALYGALSVL^^ 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 47-1. pep IAFLLTAVATWTGQPPTRGGVLVGLTIFWLAARIAAFIPGWGASASGILGTLFFWYGAVC 

I I I I I I I I I I I I I I I I I I I I I I I I I I III I I I I I I I I : I I I II I I I I I I I I I I I 

orf47ng-l IAFLLTAVATWTGQPPTRGGVLVGLTAFWLAARIAAFIPGWGAAASGILGTLFFWYGAVC 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 47-1. pep MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 

orf47ng-l MALPVIRSQNRRNYVAVFAIF^ 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 47-1. pep GTRIISFFTSKRLNVPQIPSPKWVAQASLWLPMLTAMLMAHGVLAWLSAVFAFAAGVIFT 
I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I : I II I I I : 1111:1111111111 
orf47ng-l GMRIISFFTSKRLNVPQIPSPKWVAQASLWLPMLTAILMAHGVMPWLSAAFAFAAGVIFT 
190 200 210 220 230 240 

250 260 270 280 290 300 

orf 47-1. pep VQVYRWWYKPVLKEPMLWILFAGYLFTGLGLIAVGASYFKPAFLNLGVHLIGVGGIGVLT 

I I I I II I I I I I I I I I I I I I I I I I I I II I I I II I I I 

orf47ng-l VQVYRWWYKPVLKEPMLWILFAGYLFTGLGLIAVGASYFKPAFLNLGVHLIGVGGIGVLT 
250 260 270 280 290 300 

310 320 330 340 350 360 

orf 47-1. pep LGMMARTALGHTGNPIYPPPKAVPVAFWLMMAATAVRMVAVFSSGTAYTHSIRTSSVLFA 

III I I I I I I I I I I I I I 

orf47ng-l LGMMARTALGHTGNSIYPPPKAVPVAFWLMMAATAVRMVAVFSSGTAYTHSIRTSSVLFA 

310 320 330 340 350 360 

370 380 
orf 47-1 . pep LALLVYAWKYIPWLIRPRSDGRPGX 
I I I I I I I I I I I I I I I I I I I I I I I II 
orf47ng-l LALLVYAWKYIPWLIRPRSDGRPGX 
370 380 

Furthermore, ORF47ng-l shows significant homology to an ORF from Pseudomonas stutzeri: 

gnl | PID|e246540 (Z73914) 0RF396 protein [Pseudomonas stutzeri] Length = 396 
Score = 155 bits (389), Expect = 5e-37 
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Identities = 121/391 (30%), Positives = 169/391 (42%), Gaps = 21/391 (5%) 

PWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFY WHAHEMIWGYAGLV 5 9 

P+W +AFRPF+ +LY L++ LW +TG GF WH HEM++G+A + 

PIWRLAFRPFFLAGSLYALLAIPLWVAAWTGLWP— GFQPTGGWLAWHRHEMLFGFAMAI 7 1 

VIAFLLTAVATWTGQPPTRGGVLVGLTAFWLAARIAAFIPGWGAAASGILGTLFFWYGAV 119 
V FLLTAV TWTGQ G LVGL A WLAAR+ ++ G AA L LF 

VAGFLLTAVQTWTGQTAPSGNRLVGLAAVWLAARL-GWLFGLPAAWLAPLDLLFLVALVW 130 





















120 


















Sbjct: 


250 


Query: 


294 


Sbjct: 


310 


Query: 


354 


Sbjct: 


366 



- LA +Y W+Y P L+ R DG PG 



Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 85 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 71 5>: 

1 . . ATGCCGTCTG AAGGTTCAGA CGGCmTCGGT GyCGGGGAAy CAGAAGyGGT 

51 AGCGCATGCC CAATGAGACT TCGTGGGTTT TGAAGCGGGT GTTTTCCAAG 

101 CGTCCCCAGT TGTGGTAACG GTATCCGGTG TCyAArGTCA GCTTGGGyGT 

151 GATGTCGAAa CCGACACCGG CGATGACACC AAGACCyAmG CTGCTGATrC 

201 TGTkGCTTTC GTGATAGGsA GGTTTGyTGG kmksAsyTTG TAyrATwkkG 

251 CCTssCwsTG kAGmGCCkTk CkyTGGTkkA swGrwArTAG TCGTGGTTTy 

301 TkTTyyCACC GAATGAACyT GATGTTTAAC GTGTCCGTAG GCGACGCGCG 

351 CGCCGATATA GGGTTTGAAT TTATCGTTGA GTTTGAAATC GTAAATGGCG 

401 GACAAGCCGA GAGAAGAAAC GGCGTGGAAG CTGCCGTTTC CCTGATGTTT 

451 TGTTTGGGTT TCTTTGTAGT TGTTGTTTAT CTCTTCAGTA ACTTTTTTAG 

501 TAGAAGAATT ACTTTCTTTC CATTTTCTGT AACTGGCATA ATCTGCCGCT 

551 ATTCTCCAGC CGCCGAAATC . . 

This corresponds to the amino acid sequence <SEQ ID 716; ORF67>: 

1 . . MPSEGSDGXG XGEXEXVAHA QXDFVGFEAG VFQASPVWT VSGVXXQLGX 

51 DVETDTGDDT KTXAADXVAF VIGRFXGXXL YXXAXXXXAX XWXXXXSRGF 

101 XXHRMNLMFN VSVGDARADI GFEFIVEFEI VNGGQAERRN GVEAAVSLMF 

151 CLGFFVVWY LFSNFFSRRI TFFPFSVTGI ICRYSPAAEI . . 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted QRF from N. gonorrhoeae 

ORF67 shows 51.8% identity over 199 aa overlap with a predicted ORF (ORF67ng) from 
N. gonorrhoeae: 
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orf 67 .pep 
orf 67ng 

orf 67 .pep 
orf67ng 
orf 67 .pep 
orf 67ng 
orf 67 .pep 
orf 67ng 



MPSEGSDGXGXGEXEXVAHAQXDFVGFEAG 30 



90 



VFQASPWVTVSGVXXQLGXDVETDTGDDTKTXAADXVAFVIGRFXGXXLYXXAXXXXAX 

VFQASPWVAVAGVQGQAGRDVYAHARHRAEAQAAAAVAFLIGVFLRMSVRINRNCCVSI 

XWXXXXSRGFXXHRMNLMFNVSVGDARADIGFEFIVEFEIVNGGQAERRNGVEAAVSLMF 150 
: I : I : : : : I I I I I I I : I I I I I I : I I I I I I I I I I I I I I I I I I II III 

TRVGGKSTCYFFSRIDAVSDVSVGDARTDIGFEFVVEFEIVNGGQAERRNGVECAVFLMF 266 

CLGFFVV WYLFSNFFSRRITFF-PFSVTGI ICRYSPAAEI 



206 



190 



I I I 



: I I I I : 



RLLVFYVKLVAAKSFIILSFQLFYVHGIFIWPFPVTGI IRGDAPAAEWADRHPGVDGM 32 6 



YRFHRIHRIR LFRPPGPMQL 
PACAGMTNFE IAVLSGMTVR 
EAVAHAQRGF VGFEAGVFQA 
AAAVAFLIGV FLRMSVRINR 



The ORF67ng nucleotide sequence <SEQ ID 71 7> is predicted to encode a protein comprising 
amino acid sequence <SEQ ID 7 1 8>: 

1 MPSETVGSIV NVGVDESVGF SPPFPSIQHF 
51 NRHSHGSGNL GRGVWATVLS DKFPCGQVRI 
101 VFYCARPAPV NGGRLKMPSE GSDGIGIGES 
151 SPVWAVAGV QGQAGRDVYA HARHRAEAQA_ 
201 NCCVSITRVG GKSTCYFFSR IDAVSDVSVG 
251 QAERRNGVE C AVFLMFRLLV FYVKLVA AKS 
301 PVTGI IRGDA PAAEWADRH PGVDGMRTDV SEIIAYRAYF VFAWSGWFRI 
3 51 IVGNAFGGVG * 

Based on the presence of a several putative transmembrane domains in the gonococcal protein, it 
is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 86 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 719> 

1 ATGTTTGCTT TTTTAGAAGC CTTTTTTGTC GAATACGGTT ATGCGGCTGT 

51 TTTTTTTGTA TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAGGATT 

101 TGACCTTGGT AACAGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

151 CAT AT TAT GT TTGCAGTCGG TATGCTCGGC GTATTGGTCG GGGACGGCAT 

201 CATGTTCGCC GCCGGACGAA TTTGGGGGCA GArArTCCTA rGGTTCArAC 

251 CTATTGCGsG CATCATGACG CCGrAACGTT ATGAGCAGGT TCAGGAAAAA 

301 TTCGACAAAT ACGGTAACTG GGTCTTATTT GTCGCCCGTT TCCTGCCCGG 

351 TTTGAGAACG GCCGTATTTG TTACAGCCGG TATCAGCCGC AAGGTTTCAT 

401 ACTTGCGTTT TAT CAT TAT G GATGGACTGG CCGCA. . . 

This corresponds to the amino acid sequence <SEQ ID 720; OKF78>: 

1 MFAFLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 

51 H IMFAVGMLG VLVGDGIM FA AGRIWGQXXL XFXPIAXIMT PXRYEQVQEK 

101 F DKYGNWVLF VARFLPGL RT AVFVTAGISR KVSYLRFIIM DGLAA. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 721>: 

1 ATGTTTGCTT TTTTAGAAGC CTTTTTTGTC GAATACGGTT ATGCGGCTGT 

51 TTTTTTTGTA TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAGGATT 

101 TGACCTTGGT AACAGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

151 CATATTATGT TTGCAGTCGG TATGCTCGGC GTATTGGTCG GGGACGGCAT 

201 CATGTTCGCC GCCGGACGAA TTTGGGGGCA GAAAATCCTA AGGTTCAAAC 

251 CTATTGCGCG CATCATGACG CCGAAACGTT ATGAGCAGGT TCAGGAAAAA 

301 TTCGACAAAT ACGGTAACTG GGTCTTATTT GTCGCCCGTT TCCTGCCCGG 

351 TTTGAGAACG GCCGTATTTG TTACAGCCGG TATCAGCCGC AAGGTTTCAT 

401 ACTTGCGTTT TAT C ATTATG GATGGACTGG CCGCACTGAT TTCCGTCCCT 

451 ATTTGGATTT ATCTGGGCGA ATACGGTGCG CACAACATCG ATTGGCTGAT 
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501 GGCGAAAATG CACAGCCTGC AATCGGGTAT TTTTGTTATC TTGGGTATAG 

551 GTGCGACCGT TGTCGCTTGG ATTTGGTGGA AAAAACGCCA ACGTATCCAG 

601 TTTTACCGCA GCAAATTGAA AGAAAAGCGG GCGCAACGCA AAGCCGCCAA 

651 GGCAGCCAAA AAAGCCGCGC AAAGCAAACA ATAA 

This corresponds to the amino acid sequence <SEQ ID 722; ORF78-l>: 

1 MFAFLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 

51 H IMFAVGMLG VLVGDGIM FA AGRIWGQKIL RFKPIARIMT PKRYEQVQEK 

101 FDKYGNW VLF VARFLPGLRT AVFV TAGISR KVSYLR FIIM DGLAALISVP 

151 IWIYLGEYGA HNIDWLMAKM HSLQ SGIFVI LGIGATVVAW I WWKKRQRIQ 

201 FYRSKLKEKR AQRKAAKAAK KAAQSKQ* 

Computer analysis of this amino acid sequence predicts several transmembrane domains, and also 
gave the following results: 

Homology with the dedA homologue of H. influenzae (accession number P45280) 
ORF78 and the dedA homologue show 58% aa identity in 144aa overlap: 

Orf78: 4 FLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGM — GYTNPHIMFAVGMLGV 61 

FL FF EYGY AV FVL+ICGFGVPIPED+TLV+GGVI+G+ N H+M V M+GV 

DedA: 20 FLIGFFTEYGYWAVLFVLIICGFGVPIPEDITLVSGGVIAGLYPENVNSHLMLLVSMIGV 79 

Orf78: 62 LVGDGIMFAAGRIWGQXXLXFXPIAXIMTPXRYEQVQEKFDKYGNWVLFVARFLPGLRTA 121 

L GD M+ GRI+G L F PI I+T R V+EKF +YGN VLFVARFLPGLR 
DedA: 80 LAGDSCMYWLGRI YGTKILRFRPIRRIVTLQRLRMVREKFSQYGNRVLFVARFLPGLRAP 13 9 

Orf78: 122 VFVTAGISRKVSYLRFIIMDGLAA 145 

+++ +GI+R+VSY+RF+++D AA 
DedA: 140 IYMVSGITRRVSYVRFVLIDFCAA 163 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF78 shows 93.8% identity over a 145aa overlap with an ORF (ORF78a) from strain A of TV. 

meningitidis: 

10 20 30 40 50 60 

orf 78 . pep MFAFLEAFFVEYG YAAVFFVLVICGFGVPI PEDLTLVTGGVISGMGYTNPH IMFAVGMLG 

or f 78a MFALLEAFFVEYG YAAVFFVLVICGFGVPI PEDLTLVTGGVISGMGYTNPH IMFAVGMLG 
10 20 30 40 50 60 



70 80 90 100 110 120 

orf 78 . pep VLVGDGIM FAAGRIWGQXXLXFXPIAXIMTPXRYEQVQEKFDKYGNW VLFVARFLPGLRT 
I I I I I I I I I I I I I I I I I I ! ill I I I I II I I I I I I I I II I I I I I I I I I II I I I I 
o r f 7 8 a VLVGDGIMFAAGRI WGQK ILKFKPIARIMT PKRYAQVQEKFDKYGNW VLFVARFLPGLRT 

70 80 90 100 110 120 



130 140 
AVFV TAGISRKVSYLR FI IMDGLAA 

AVFV TAGISRKVSYLR FLIMDGLAALISVPVWI YLGEYGAHNIDWLMAKMHSLQ SGIFIA 
130 140 150 160 170 180 

The complete length ORF78a nucleotide sequence <SEQ ID 723> is: 

1 ATGTTTGCCC TTTTGGAAGC CTTTTTTGTC GAATACGGCT ATGCGGCCGT 

51 GTTTTTCGTT TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAGGATT 

101 TGACCTTGGT AACAGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

151 CATATTATGT TTGCAGTCGG TATGCTCGGC GTATTGGTCG GGGACGGCAT 

201 CATGTTCGCC GCCGGACGCA TCTGGGGGCA GAAAATCCTC AAGTTCAAAC 

251 CGATTGCGCG CATCATGACG CCGAAACGTT ACGCACAGGT TCAGGAAAAA 

301 TTCGACAAAT ACGGCAACTG GGTGTTATTT GTCGCTCGTT TCCTGCCCGG 

351 TTTGCGGACT GCCGTTTTCG TTACCGCCGG CATCAGCCGC AAAGTATCGT 

4 01 ATCTGCGCTT TCTGATTATG GACGGGCTTG CCGCGCTGAT TTCCGTGCCC 

451 GTTTGGATTT ACTTGGGCGA GTACGGCGCG CACAACATCG ATTGGCTGAT 



orf 78 .pep 
orf78a 
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501 GGCGAAAATG CACAGCCTGC AATCCGGCAT CTTCATCGCA TTGGGCGTGC 

551 TGGCGGCGGC GCTGGCGTGG TTCTGGTGGC GCAAACGCCG ACATTATCAG 

601 CTTTACCGCG CACAATTGAG CGAAAAACGC GCCAAACGCA AGGCGGAAAA 

651 GGCAGCGAAA AAAGCGGCAC AGAAGCAGCA GTAA 

This encodes a protein having amino acid sequence <SEQ ID 724>: 

1 MFALLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 

51 H IMFAVGMLG VLVGDGIM FA AGRIKGQKIL KFKPIARIMT PKRYAQVQEK 

101 FDKYGNW VLF VARFLPGLRT AVFV TAGISR KVSYLR FLIM DGLAALISVP 

151 VWIYLGEYGA HNIDWLMAKM HSLQ SGIFIA LGVLAAALAW F WWRKRRHYQ 

201 LYRAQLSEKR AKRKAEKAAK KAAQKQQ* 

ORF78a and ORF78-1 show 89.0% identity in 227 aa overlap: 

10 20 30 40 50 60 

MFALLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGMLG 

MFAFLEAFFVE YGYAAVFFVLVICGFGVPI PEDLTLVTGGVI SGMGYTN PH IMFAVGMLG 
10 20 30 40 50 60 



orf78a.pep 
orf78-l 



70 80 90 100 110 120 

orf78a.pep vlvgdgimfaagriwgqkilkfkp:arimtpkryaqvqekfdkygnwvlfvarflpglrt 

I I I I I I I I I I I I I I I I li I I : I I I M ! I I i I II I I II II I I I I I I I I I I I I I I I I I I 
orf78-l vlvgdgimfaagriwgqkilrfkpiarimtpkryeqvqekfdkygnwvlfvarflpglrt 
70 80 90 100 110 120 



130 140 150 160 170 180 

or f 7 8a. pep avfvtagisrkvsylrflimdglaalisvpvwiylgeygahnidwlmakmhslqsgifia 

orf78-l avfvtagisrkvsylrfiimdglaalisvpiwiylgeygahnidwlmakmhslqsgifvi 
130 140 150 160 170 180 



190 200 210 220 

orf78a.pep lgvlaaalawfwwrkrrhyqlyraqlsekrakrkaekaakkaaqkqqx 
II: I : :: I I : I I : I I :: I : I I : : I : I I I I : I I I I I I I I I I I :: I I 
orf78-l LGI GAT WAW I wwkkrqr iqfyrs klkekraqrkaakaakkaaqskqx 

190 200 210 220 



Homology w ith a predicted ORF from ^gonorrhoeae 

ORF78 shows 97.4% identity over 38 aa overlap with a predicted ORF (ORF78ng) from N. 
gonorrhoeae: 

orf78 .pep XXLXFXPIAXIMT PXRYEQVQEKFDKYGNWVLFVARFLPGLRTAVFVTAGISRKVSYLRF 137 

I I I I I I I I I I I I I I I I I I I I I 

orf7 8ng YPVLFVARFLPGLRTAVFVTAGISRKVSYLRF 32 

orf7 8.pep IIMDGLAA 145 

orf7 8ng LIMDGLAALISVPVWIYLGEYGAHNIDWLMAKMHSLQSGIFIALGVLAAALAWFWWRKRR 92 

The ORF78ng nucleotide sequence <SEQ ID 725> is predicted to encode a protein comprising 
amino acid sequence <SEQ ID 726>: 

1 . . YP VLFVARFL PGLRTAVFV T AGISRKVSYL R FLIMDGLAA LISVPVWI YL 
51 GEYGAHNIDW lmakmhslq s GIFIALGVLA AALAWF WWRK RRHYQLYRAQ 
101 LSEKRAKRKA EKAAKKAAQK QQ* 

Further work revealed the complete gonococcal nucleotide sequence <SEQ ID 727>: 



atgtttgccc tttTggaagc CTTTTTTGTC GAAtacggCt atgcGGCCGT 
GTTTTTCGTT TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAAGATT 
TGACCTTGGT AACGGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 
CATATTATGT TTGCGGTCGG TATGCTCGGC GTGTTGGCGG GCGACGGCGT 
GATGTTTGCC GCCGGACGCA TCTGGGGGCA GAAAATCCTC AAGTTCAAAC 
CGATTGCGCG CATCATGACG CCGAAACGTT ACGCGCAGGT TCAGGAAAAA 
TTCGACAAAT ACGGCAACTG GGTTCTGTTT GTCGCCCGTT TCCTGCCGGG 
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351 TTTGCGGACT GCCGTTTTCG TTACCGCCGG CATCAGCCGC AAAGTATCGT 

401 ATCTGCGCTT TCTGATTATG GACGGGCTGG CCGCGCTGAT TTCCGTGCCC 

451 GTTTGGATTT ACTTGGGCGA GTACGGCGCG CACAACATCG ATTGGCTGAT 

501 GGCGAAAATG CACAGCCTGC AATCGGGCAT CTTCATCGCA TTGGGCGTGC 

551 TGGCGGCGGC GCTGGCGTGG TTCTGGTGGC GCAAACGCCG ACATTATCAG 

601 CTTTACCGCG CACAATTGAG CGAAAAACGC GCCAAACGCA AGGCGGAAAA 

651 GGCAGCGAAA AAAGCGGCAC AGAAGCAGCA GTAa 

This corresponds to the amino acid sequence <SEQ ID 728; ORF78ng-l>: 

1 MFALLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 

51 H IMFAVGMLG VLAGDGVM FA AGRIWGQKIL KFKPIARIMT PKRYAQVQEK 

101 FDKYGNW VLF VARFLPGLRT AVFV TAGISR KVSYLR FLIM DGLAALISVP 

151 VWIYLGEYGA HNIDWLMAKM HSLQ SGIFIA LGVLAAALAW FW WRKRRHYQ 

201 LYRAQLSEKR AKRKAEKAAK KAAQKQQ* 

ORF78ng-l and ORF78-1 show 88.1% identity in 227 aa overlap: 

10 20 30 40 50 60 

orf 78-1 . pep MFAFLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGMLG 
I I I : I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I 
orf7 8ng-l MFALLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGMLG 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 78-1 . pep VLVGDGIMFAAGRIWGQKILRFKPIARIMTPKRYEQVQEKFDKYGNWVLFVARFLPGLRT 
I I : I I I : I I I I I I I I I I I I I : I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf7 8ng-l VLAGDGVMFAAGRIWGQKILKFKPIARIMTPKRYAQVQEKFDKYGNWVLFVARFLPGLRT 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 7 8-1. pep AVFVTAGISRKVSYLRFIIMDGLAALISVPIWIYLGEYGAHNIDWLMAKMHSLQSGIFVI 

orf7 8ng-l AVFVTAGISRKVSYLRFLIMDGLAALISVPVWIYLGEYGAHNIDWLMAKMHSLQSGIFIA 
130 140 150 160 170 180 

190 200 210 220 

orf 78-1 . pep LGIGATVVAWIWWKKRQRIQFYRSKLKEKRAQRKAAKAAKKAAQSKQX 

Orf 7 8ng-l LGVLAAALAWEWRKRRHYQLYRAQLSEKRAKRKAEKAAKKAAQKQQX 
190 200 210 220 

Furthermore, orf78ng-l shows homology to the dedA protein from H. influenzae: 

sp|P45280|YG2 9_HAEIN HYPOTHETICAL PROTEIN HI1629 >gi I 1073983 I pir | | D64133 dedA 
protein (dedA) homolog - Haemophilus influenzae (strain Rd KW20) 
>gi 1 157447 6 (U32836) dedA protein (dedA) [Haemophilus influenzae] Length = 212 
Score = 223 bits (563), Expect = 7e-58 

Identities = 108/182 (59%), Positives = 140/182 (76%), Gaps = 2/182 (1%) 



Query: 


5 


LEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGM— GYTNPHIMFAVGMLGVL 


62 






L FF EYGY AV FVL+ICGFGVPIPED+TLV+GGVI+G+ N H+M V M+GVL 




Sb j ct : 


21 


LIGFFTEYGYWAVLFVLIICGFGVPIPEDITLVSGGVIAGLYPENVNSHLMLLVSMIGVL 


80 




63 


AGDGVMFAAGRIWGQKILKFKPIARIMTPKRYAQVQEKFDKYGNWVLFVARFLPGLRTAV 


122 






AGD M+ GRI+G KIL+F+PI RI+T +R V+EKF +YGN VLFVARFLPGLR + 




Sbjct: 


81 


AGDSCMYWLGRIYGTKILRFRPIRRIVTLQRLRMVREKFSQYGNRVLFVARFLPGLRAPI 


140 


Query: 


123 


FVTAGISRKVSYLRFLIMDGLAALISVPVWIYLGEYGAHNIDWLMAKMHSLQSGIFIALG 


182 






++ +GI+R+VSY+RF+++D AA+ISVP+WIYLGE GA N+DWL ++ Q I+I +G 




Sb j ct : 


141 


YMVSGITRRVSYVRFVLIDFCAAIISVPIWIYLGELGAKNLDWLHTQIQKGQIVIYIFIG 


200 


Query: 


183 


VL 184 
L 




Sbjct: 


201 


YL 202 
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Based on this analysis, including the presence of putative transmembrane domains, it is predicted 
that these proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful 
antigens for vaccines or diagnostics, or for raising antibodies. 

Example 87 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 729>: 

1 ATGAAAAAAT TATTGGCGGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTCCGCCGCC GGAGTCCACG TTGAGGACGG CTGGGCGCGC ACCACCGTCG 

101 AAGGTATGAA AATAGGCGGC GCGTTCATGA AAATCCACAA CGACGAAGCC 

151 AAACAAGACT TTTTGCTCGG CGGAAGCAGC CCCGTTGCCG ACCGCGTCGA 

201 AGTGCATACC CACATCAACG ACAACGGCGT GATGCGGATG CGCGAAGTCG 

251 AAGGCGGCGT GCCTTTGGAA GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCATG TGATGTTTAT GGGTTTGAAA AAACAATTAA AAGAGGGCGA 

351 TAAAATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCG CAAACCGTCC 

401 AACTGGAAGT CAAAATCGCG CCGATGCCGG CAATGAACCA C... 

This corresponds to the amino acid sequence <SEQ ID 730; ORF79>: 

1 MKKLLAAVMM AGLAGA VSAA GVHVEDGWAR TTVEGMKIGG AFMKIHNDEA 
51 KQDFLLGGSS PVADRVEVHT HINDNGVMRM REVEGGVPLE AKSVTELKPG 
101 SYHVMFMGLK KQLKEGDKIP VTLKFKNAKA QTVQLEVKIA PMPAMNH . . 

Further work revealed the complete nucleotide sequence <SEQ ID 73 1>: 

1 ATGAAAAAAT TATTGGCGGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTCCGCCGCC GGAGTCCACG TTGAGGACGG CTGGGCGCGC ACCACCGTCG 

101 AAGGTATGAA AATAGGCGGC GCGTTCATGA AAATCCACAA CGACGAAGCC 

151 AAACAAGACT TTTTGCTCGG CGGAAGCAGC CCCGTTGCCG ACCGCGTCGA 

201 AGTGCATACC CACATCAACG ACAACGGCGT GATGCGGATG CGCGAAGTCG 

251 AAGGCGGCGT GCCTTTGGAA GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCATG TGATGTTTAT GGGTTTGAAA AAACAATTAA AAGAGGGCGA 

351 TAAAATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCG CAAACCGTCC 

401 AACTGGAAGT CAAAATCGCG CCGATGCCGG CAATGAACCA CGGTCATCAC 

4 51 CACGGCGAAG CGCATCAGCA CTAA 

This corresponds to the amino acid sequence <SEQ ID 732; ORF79-l>: 

1 MKKLLAAVMM AGLAGA VSAA GVHVEDGWAR TTVEGMKIGG AFMKIHNDEA 
51 KQDFLLGGSS PVADRVEVHT HINDNGVMRM REVEGGVPLE AKSVTELKPG 
101 SYHVMFMGLK KQLKEGDKIP VTLKFKNAKA QTVQLEVKIA PMPAMNHGHH 
151 HGEAHQH* 

Computer analysis of this amino acid sequence revealed a putative leader peptide and also gave the 
following results: 

Homology with a predicted ORF from N .meningitidis (strain A) 

ORF79 shows 94.6% identity over a 147aa overlap with an ORF (ORF79a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 7 9 . pep MKKLLAAVMMAGLAGA VSAAGVHVEDGWARTTVEGMKIGGAFMKIHNDEAKQDFLLGGSS 

II II Illlllhlllllllllllllll: Mill Mill 

orf 1 9a MKXLLA? T _ AAGIHVEDGWARTTVEGMKMGGAFMKIHNDEAKQDFLLGGSS 



PVADRVEVHTHINDNGVMRMRSVEGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIP 

PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGXKKQLKXGDKIP 
70 80 90 100 110 120 
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orf79.pep 

orf 7 9a VTLKFKNAKAQTVQLEVKTAPMSAMDHGHHHGEAHQHX 
130 140 150 

The complete length ORF79a nucleotide sequence <SEQ ID 733> is: 

1 ATGAAANAAC TATTGGCAGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTCCGCCGCC GGAATCCACG TTGAGGACGG CTGGGCGCGC ACCACCGTCG 

101 AAGGTATGAA AATGGGCGGC GCGTTCATGA AAATCCACAA CGACGAAGCC 

151 AAACAAGACT TTTTGCTCGG CGGAAGCAGC CCTGTTGCCG ACCGCGTCGA 

201 AGTGCATACC CATATCAATG ATAACGGTGT GATGCGGATG CGCGAAGTCG 

251 AAGGCGGCGT GCCTTTGGAG GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCATG TCATGTTTAT GGGTNTGAAA AAACAATTAA AAGANGGCGA 

351 CAAGATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCA CAAACCGTCC 

4 01 AACTGGAAGT CAAAACCGCG CCGATGTCGG CAATGGACCA CGGTCATCAC 

451 CACGGCGAAG CGCATCAGCA CTAA 

This encodes a protein having amino acid sequence <SEQ ID 734>: 

1 MKXLLAAVMM AGLAGA VSAA GIHVEDGWAR TTVEGMKMGG AFMKIHNDEA 
51 KQDFLLGGSS PVADRVEVHT HINDNGVMRM REVEGGVPLE AKSVTELKPG 
101 SYHVMFMGXK KQLKXGDKIP VTLKFKNAKA QTVQLEVKTA PMSAMDHGHH 
151 HGEAHQH* 

ORF79a and ORF79-1 show 94.9% identity in 157 aa overlap: 

10 20 30 40 50 60 

orf 7 9a . pep MKXLLAAVMMAGLAGAVSAAGIHVEDGWARTTVEGMKMGGAFMKIHNDEAKQDFLLGGSS 

II II II : I I I I I I I I I I I I I I I : I I I I I I I I I I Illlll 

orf 7 9-1 MKKLLAAVMMAGLAGAVSAAGVHVEDGWARTTVEGMKIGGAFMKIHNDEAKQDFLLGGSS 



PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGXKKQLKXGDKIP 

I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I Ill 

PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGLKKQT.KEGDKIP 



orf 7 9-1 VTLKFKNAKAQTVQLEVKIAPMPAMNHGHHHGEAHQHX 
130 140 150 

Homology with a predicted ORF from ~N .gonorrhoeae 

ORF79 shows 96.1% identity over 76 aa overlap with a predicted ORF (ORF79ng) from 
N. gonorrhoeae: 

orf 79 .pep FMKIHNDEAKQDFLLGGSSPVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGS 101 

I I I I II I 1 I I I I : II I I I I i I I I I I II I I I 
orf79ng INDNGVMRMREVKGGVPLEAKSVTELKPGS 30 

orf79.pep YHVMFMGLKKQLKEGDKIPVTLKFKNAKAQTVQLEVKIAPMPAMNH 147 

I I I I I I I I I I I II I I I I I I I I I II I I II I I I I I I I I I III I I I I 
orf79ng YHVMFMGLKKQLKEGDKIPVTLKFKNAKAQTVQLEVKTAPMSAMNHGHHHGEAHQH 86 

An ORF79ng nucleotide sequence <SEQ ID 735> was predicted to encode a protein comprising 
amino acid sequence <SEQ ID 736>: 



Further work revealed the complete gonococcal DNA sequence <SEQ ID 73 7>: 
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ATGAAAAAAT 
TTccgccgCc 
aaggtATgaa 
atacaaGACt 
AGTGCAtaca 
AAGGCGGCGT 
AGCTATCACG 
CAAGATTCCC 
AACTGGAAGT 
CACGGCGAAG 



TATTGGCAGC 
GGagTccAtG 
aatggGCGGC 
ttgtgcTCgg 
cacATCAACG 
GCCTTTGGAG 
TGATGTTTAT 
GTTACCCTGA 
CAAAACCGCG 
CGCATCAGCA 



CGTGATGATG 
TCGAggACGG 
GCgttCATga 
CGGaagcatg 
ACAACGGCGT 
GCGAAATCCG 
GGGTTTGAAA 
AATTTAAAAA 
CCGATGTCGG 
CTAA 



GCAGGTTTGG 
CTGGGCGCGc 
aaATCCACAA 
cccgttgccg 
GATGCGTATG 
TTACCGAACT 
AAACAACTGA 
CGCCAAAGCG 
CAATGAACCA 



CAGGCGCGGT 
accaCTGtcg 
CGACGaaGcc 
accgcGTCGA 
CGCGAAGTCA 
CAAACCCGGC 
AAGAGGGCGA 
CAAACCGTCC 
CGGTCATCAC 



This corresponds to the amino acid sequence <SEQ ID 738; ORF79ng-l>: 

1 MKKLLAAVMM AGLAGA VSAA GVHVEDGWAR TTVEGMKMGG AFMKIHNDEA 
51 IQDFVLGGSM PVADRVEVHT HINDNGVMRM REVKGGVPLE AKSVTELKPG 
101 SYHVMFMGLK KQLKEGDKIP VTLKFKNAKA QTVQLEVKTA PMSAMNHGHH 
151 HGEAHQH* 

ORF79ng-l and ORF79-1 show 95.5% identity in 157 aa overlap: 

10 20 30 40 50 60 

orf7 9-l.pep MKKLLAAVMMAGLAGAVS AAGVHVEDGWARTTVEGMKI GGAFMKI HN DE AKQD FLLGG S S 

orf7 9ng-l MKKLLAAVMMAGLAGAVS AAGVHVEDGKARTTVEGMKMGGAFMKIHNDEAIQDFVLGGSM 



orf 79-1. pep 



PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIP 
I I I I I I I I I I I I I I I 1 I t I ! I I r | I I | I I I I I I I I I I I I I I I I I I ! I I ! I I I I I I I I I I 
PVADRVEVHTHINDNGVMRMREVKGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIP 



30 orf79-l.pep VTLKFKNAKAQTVQLEVKIAPMPAMNHGHHHGEAKQHX 

orf 7 9ng-l VTLKFKNAKAQTVQLEVKTAPMSAMNHGHHHGEAHQHX 
130 140 150 

Furthermore, ORF79ng-l shows significant homology to a protein from Aquifex aeolicus: 

35 gl 12983695 (AE000731) putative protein [Aquifex aeolicus] Length = 151 

Score =63.6 bits (152), Expect = 6e-10 

Identities = 38/114 (33%), Positives = 58/114 (50%), Gaps = 1/114 (0%) 

Query: 24 VEDGWARTTVEGMKMGGAFMKIHNDEAIQDFVLGGSMPVADRVEVHTHTNDNGVMRMREV 83 
40 V+ W G M I N+ D+++G +A RVE+H + +N V +M 

Sbjct: 27 VKHPWVMEPPPGPNTTMMGMIIVNEGDEPDYLIGAKTDIAQRVELHKTVIENDVAKMVPQ 8 6 

Query: 84 KGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIPVTLKFKNAKAQTVQLEV 137 
+ + + K E K YHVM +GLKK++KEGDK+ V L F+ + TV+ V 
45 Sbjct: 87 ER-IEIPPKGKVEFKHHGYHVMIIGLKKRIKEGDKVKVELIFEKSGKITVEAPV 139 

Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF79-1 (15.6kDa) was cloned in the pET vector and expressed in E.coli, as described above. The 
50 products of protein expression and purification were analyzed by SDS-PAGE. Figure 1 8A shows 
the results of affinity purification of the His-fusion protein. Purified His-fusion protein was used 
to immunise mice, whose sera were used for ELISA (positive result) and FACS analysis (Figure 
18B) These experiments confirm that ORF79-1 is a surface-exposed protein, and that it is a useful 



immunogen. 
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Example 88 

The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ID 
739>: 



1 ATGACGGTAA CTGCGGCCGA AGGCGGCAAA GCTGCCAAGG CGTTAAAAAA 

51 ATATCTGATT ACGGGCATTT TGGTCTGGCT GCCGATTGCG GTAACGGTTT 

101 GGGTGGTTTC CTATATCGTT TCCGCGTCCG ATCAGCTCGT CAACCTGCTG 

151 CCGAAGCAAT GGCGGCCGCA ATATGTTTTG GGGTTTAATA TCCCGGGGCT 

201 GGGCGTTATC GTTGCCATTG CCGTATTGTT TGTAACCGGA TTGTTTGCCG 

251 CCAACGTATT GGGTCGGCAG ATCCTCGCCG CGTGGGACAG CCTGTTGGGG 

301 CGGATTCCGG TTGTGAAAtC CATCTATTCG AGTGTGAAAA AAGTATCCGA 

351 ATacgTGCTG TCCGACAGCA GCCGTTCGTT TAAAACGCCG GTACTCGTGC 

4 01 CGTTTCCCCA GCCCGGTATT TGGACGATyG CTTTCGTGTC AGGGCAGGTG 

4 51 TCGAATGCGG TTAAGGCCGC ATTGCCGAAs GACGGCGATT ATCTTTCCGT 

501 GTATGTTCCG ACCACGCCGA ATCCGACCGG CGGTTACTAT ATTATGGTAA 

551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA AsCATTGAAA 

601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 

651 ATTGGCAsGA CCTATGCCGT CTGAAAAGGC GGATTTGCCC GAACAACAAT 

701 AA 

This corresponds to the amino acid sequence <SEQ ID 740; ORF98>: 

1 MTVTAAEGGK AAKALKKYLI TGILVWLPIA VTVWWSYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPGLGVI VAIAVLFVTG LFAANVLGRQ ILAAWDSLLG 

101 RIPWKSIYS SVKKVSEYVL SDSSRSFKTP VLVPFPQPGI WTIAFVSGQV 

151 SNAVKAALPX DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEXLK 

201 YVISLGMVIP DDLPVKTLAX PMPSEKADLP EQQ* 

Further work revealed the complete nucleotide sequence <SEQ ID 741>: 

1 ATGACGGAAC nTGCGGCCGA AGGCGGCAAA GCTGCCAArG CGTTAAAAAA 

51 ATATCTGATT ACGGGCATTT TGGTCTGGCT GCCGATTGCG GTAACGGTTT 

101 GGGTGGTTTC CTATATCGTT TCCGCGTCCG ATCAGCTCGT CAACCTGCTG 

151 CCGAAGCAAT GGCGGCCGCA ATATGTTTTG GGGTTTAATA TCCCGGGGCT 

201 GGGCGTTATC GTTGCCATTG CCGTATTGTT TGTAACCGGA TTGTTTGCCG 

251 CCAACGTATT GGGTCGGCAG ATCCTCGCCG CGTGGGACAG CCTGTTGGGG 

301 CGGATTCCGG TTGTGAAATC CATCTATTCG AGTGTGAAAA AAGTATCCGA 

351 ATCGCTGCTG TCCGACAGCA GCCGTTCGTT TAAAACGCCG GTACTCGTGC 

4 01 CGTTTCCCCA GCCCGGTATT TGGACGATTG CTTTCGTGTC AGGGCAGGTG 

4 51 TCGAATGCGG TTAAGGCCGC ATTGCCGAAG GACGGCGATT ATCTTTCCGT 

501 GTATGTTCCG ACCACGCCGA ATCCGACCGG CGGTTACTAT ATTATGGTAA 

551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA AGCATTGAAA 

601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 

651 ATTGGCAGGA CCTATGCCGT CTGAAAAGGC GGATTTGCCC GAACAACAAT 

701 AA 

This corresponds to the amino acid sequence <SEQ ID 742; ORF98-l>: 



1 MTEXAAEGGK AAKALKKYL I TGILVWLPIA VTVWW SYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFA ANVLGRQ ILAAWDSLLG 

101 RIPWKSIYS SVKKVSESLL SDSSRSFKTP VLVPFPQPGI WTIAFVSGQV 

151 SNAVKAALPK DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 

201 YVISLGMVIP DDLPVKTLAG PMPSEKADLP EQQ* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain K) 

ORF98 shows 96.1% identity over a 233aa overlap with an ORF (ORF98a) from strain A of AT. 
meningitidis: 

10 20 30 40 50 60 

orf 98 .pep MTVTAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 

orf98a MTEPAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 
10 20 30 40 50 60 
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70 80 90 100 110 120 

orf 98 . pep GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSEYVL 

orf98a GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSXSLL 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf 98 . pep SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPXDGDYLSVYVPTTPNPTGGYY 

orf 98a sdUrsfktpvlvpfpqsgiwtiafvsgqvsnavkaalpkdgdylsvyvpttpnptggyy 

130 140 150 160 170 180 

190 200 210 220 230 

orf 98 .pep IMVKKSDVRELDMSVDEXLKYVISLGMVIPDDLPVKTLAXPMPSEKADLPEQQX 
I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I 1 I I 
orf 98a IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPSEKADLPEQQX 

190 200 210 220 230 

The complete length ORF98a nucleotide sequence <SEQ ID 743> is: 

1 ATGACGGAAC CTGCGGCCGA AGGCGGCAAA GCTGCCAAGG CGTTAAAAAA 

51 ATATCTGATT ACGGGCATTT TGGTCTGGCT GCCGATTGCG GTAACGGTTT 

101 GGGTGGTTTC CTATATCGTT TCCGCGTCCG ATCAGCTCGT CAACCTGCTG 

151 CCGAAGCAAT GGCGGCCGCA ATATGTTTTG GGGTTTAATA TCCCGGGGCT 

201 GGGCGTTATC GTTGCCATTG CCGTATTGTT TGTAACCGGA TTATTTGCCG 

251 CAAACGTATT GGGCCGGCAG ATTCTTGCCG CGTGGGACAG CTTGTTGGGG 

301 CGGATTCCGG TTGTGAAGTC CATCTATTCG AGTGTGAAAA AAGTATCCGA 

351 NTCGTTGCTG TCCGACAGCA GCCGTTCGTT TAAAACACCA GTACTCGTGC 

401 CGTTTCCCCA ATCGGGTATT TGGACAATCG CATTCGTGTC CGGTCAGGTG 

451 TCGAATGCGG TTAAGGCCGC ATTGCCGAAG GACGGCGATT ATCTTTCCGT 

501 GTATGTTCCG ACCACGCCGA ATCCGACCGG CGGTTACTAT ATTATGGTAA 

551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA AGCGTTGAAA 

601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 

651 ATTGGCAGGA CCTATGCCGT CTGAAAAGGC GGATTTGCCC GAACAACAAT 

701 AA 

This encodes a protein having amino acid sequence <SEQ ID 744>: 

1 MTEPAAEGGK AAKALKKYL I TGILVWLPIA VTVWVV SYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFA ANVLGRQ ILAAWDSLLG 

101 RIPWKSIYS SVKKVSXSLL SDSSRSFKTP VLVPFPQSGI WTIAFVSGQV 

151 SNAVKAALPK DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 

201 YVISLGMVIP DDLPVKTLAG PMPSEKADLP EQQ* 

ORF98a and ORF98-1 show 98.7% identity in 233 aa overlap: 

10 20 30 40 50 60 

orf 98a. pep MTEPAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 

III II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 98-1 MTEXAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 98a. pep GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKS I YS SVKKVSXSLL 
I I I I I II I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III 
orf 98-1 GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSESLL 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf 98a. pep SDSSRSFKTPVLVPFPQSGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 
I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 
orf 98-1 SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 
130 140 150 160 170 180 

190 200 210 220 230 

orf 98a . pep IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPSEKADLPEQQX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 98-1 IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPSEKADLPEQQX 

190 200 210 220 230 
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Homologv with a predicted ORF from N. gonorrhoeae 

ORF98 shows 95.3% identity over a 233 aa overlap with a predicted ORF (ORF98ng) from 
N. gonorrhoeae: 

10 20 30 40 50 60 

orf98.pep MTVTAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 60 

orf98ng MTEPAAEGGKAAKALKKYLITGILVWLPIAVTVWVVSYIVSASDQLVNLLPKQWRPQYVL 60 

orf 98 .pep GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKV5EYVL 120 

I I I I I I I I I I I I I Ml M I =1 

orf98ng GFNIPGLGVIVAIAVLFVTGLFAAMVLGRQILAAWDSLLXRIPWKSIYSSVKKVSESLL 120 

orf 98 .pep SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPXDGDYLSVYVPTTPNPTGGYY 180 

II II I I I I I II I I I I I I llllll MINI MINIM Mill 

orf 98ng SDSSRSFKT PVLVPFPQSGIWTIAFVSGQVSNAVKAALPQDGDYLSVYVPTTPNPTGGYY 180 

orf 98. pep IMVKKSDVRELDMSVDEXLKYVISLGMVIPDDLPVKTLAXPMPSEKADLPEQQ 233 
orf98ng IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPPEKAELPEQQ 233 

The complete length ORF98ng nucleotide sequence <SEQ ID 745> is predicted to encode a protein 
having amino acid sequence <SEQ ID 746>: 

1 MTEPAAEGGK AAKALKKYL I TGILVWLPIA VTVWW SYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFA A.MVLGRQ ILAAWDSLLX 

101 RIPVVKSIYS SVKKVSESLL SDSSRSFKTP VLVPFPQSGI WTIAFVSGQV 

151 SNAVKAALPQ DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 

201 YVISLGMVIP DDLPVKTLAG PMPPEKAELP EQQ* 

Further work revealed the complete nucleotide sequence <SEQ ID 747>: 

1 ATGACGGAAC CTGCGGCCGA AGGCGGCAAA GCTGCCAAGG CGTTAAAAAA 

51 ATATCTGATT ACAGGCATTT TGGTCTGGCT GCCGATTGCG GTAACGGTTT 

101 GGGTGGTTTC CTATATCGTT TCXGCGTCCG ACCAGCTTGT CAACCTGCTG 

151 CCGAAGCAAT GGCGGCCGCA ATATGTTTTG GGGTTTAATA TCCCCGGGCT 

201 CGGCGTTATT GTTGCCATTG CCGTATTGTT TGTAACCGGA TTATTTGCCG 

251 CAAACGTGTT GGGCCGGCAG ATTCTTGCCG CGTGGGACAG CCTGTTgggg 

301 cggaTTCCGG TTGTCAAATC CATCTATTCG AGTGTGAAAA AAGTATCCGA 

351 ATCGCTGCTG TCCGACAGCA GCCGTTCGTT TAAAACGCCG GTACTCGTGC 

4 01 CGTTTCCCCA ATCGGGTATT TGGACAATCG CATTCGTGTC CGGTCAGGTG 

4 51 TCGAATGCGG TTAAGGCCGC ATTGCCGCAG GATGGCGATT ATCTTTCCGT 

501 GTATGTCCCG ACCACGCCCA ACCCGACCGG CGGTTACTAT ATTATGGTAA 

551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA AGCGTTGAAA 

601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 

651 ATTGGCAGGA CCTATGCCGC CTGAAAAGGC GGAGTTGCCC GAACAACAAT 

701 AA 

This corresponds to the amino acid sequence <SEQ ID 748; ORF98ng-l>: 

1 MTEPAAEGGK AAKALKKYL I TGILVWLPIA VTVWW SYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFA ANVLGRQ ILAAWDSLLG 

101 RIPWKSIYS SVKKVSESLL SDSSRSFKTP VLVPFPQSGI WTIAFVSGQV 

151 SNAVKAALPQ DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 

201 YVISLGMVIP DDLPVKTLAG PMPPEKAELP EQQ* 

ORF98ng-l and ORF98-1 show 97.9% identity in 233 aa overlap: 

10 20 30 40 50 60 

orf 98-1 .pep MTEXAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSAS DQLVNLLPKQWRPQYVL 

III II I I II I I II I M II I M I M II I I II I II II I I I I II II I M II I I II I II I II I 

orf 98ng-l MTEPAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSAS DQLVNLLPKQWRPQYVL 



70 80 90 100 110 120 

orf 98-1. pep GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSESLL 



WO 99/24578 



-419- 



PCT/IB98/01665 



orf98ng-l GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKrCVSESLL 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf 98-1 . pep SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 

orf98ng-l SDSSRSFKTPVLVPFPQSGIWTIAFVSGQVSNAVKAALPQDGDYLSVYVPTTPNPTGGYY 
130 140 150 160 170 180 

190 200 210 220 230 

orf 98-1 .pep IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPSEKADLPEQQX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I ! I I I I 
orf 98ng-l IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPPEKAELPEQQX 

190 200 210 220 230 

Based on this analysis, including the fact that the putative transmembrane domains in the 
gonococcal protein are identical to the sequences in the meningococcal protein, it is predicted that 
the proteins from N.meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens 
for vaccines or diagnostics, or for raising antibodies. 



Example 89 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 749>: 

1 ATgAAAACGG TAGTCTGGAT TGTCGTCCTG TTTGCCGCCG CCGTCGGACT 

51 GGCGCTGGCT TCGGGCATTT ACACCGGCGA CGTGTATATC GTACTCGGAC 

101 AGACCATGCT CAGAATCAAC CTGCACGCCT TTGTGTTAGG TTCGCTGATT 

151 GCCGTCGTGG TGTGGTATTT CTTGTTTAAA TTCATTATCG GjjGgTACTCA 

2 01 ATATCCCCGA AAAGATGCAG CGTTTCGGTT CGGCnCGTAA AGGCCkCAAG 

251 ssCGsGCTTG CCTTGAACAA GGCGGGTTTG GCGTATTTTG AAGGGCGTTT 

301 TGAAAAGGCG GAACTAGAAG CCTCACGCGT GTTGGTCAAC AAAGtAGGCC 

351 G^GAGACAAC CGGACTTTGG CATTGATGCT GrGCGCGCAC GCCGCCGGAC 

401 AGATGGAAAA CATCGAS3TG CGCGACCGTT ATCTTGCGGA AATCGCCAAA 

4 51 CTGCCGGAAA AACAGCAGCT TTCCCGTTAT CTTTTGTTGG CGGAATCGGC 

501 GTTGAACCGG CGCGATTACG AAGCGGCGGA AGCCAATCTT CATGCGGCGG 

551 CGAAGATGAA TGCCAACCTT ACGCGCCTCG TGCGTCTGCA .ATTCGTTAC 

601 GCTTTCGACA GGGGCGACGC GTTGCAGGTT CTGGCAAAAA CCGAAAAACT 

651 TTCCAAGGCG GGCGCGTTGG GCAAATCGGA AATGGAACGG TATCAAAATT 

7 01 GGGCATATCC GTCGCCAGCT GGCGGATGCT GCCGATGCCG CCGCTTTGAA 

7 51 AACCTGCCTG AAGCGGATTC CCGACAGCCT CAAAAACGGG GAATTGAGCG 

801 TATCGGTTGC GGAAAAGTAC GAACGTTTGG GACTGTATGC CGATGCGGTC 

851 AAATGGGTCA AACAGCATTA TCCGCAsAAC CGCCGCCCCG AGCTTTTGGA 

901 AGCCTTTGTC GAAAGCGTGC GCTTTTTGGG CGAGCGCGAA CAGCAGAAAG 

951 CCATCGATTT TGCCGATGCT TGGCTGAAAG AACAGCCCGA TAACGCGCTT 

1001 CTGCTGATGT ATCTCGGTCG GCTCGCCTTC GGCCGCAAAC TTTGGGGCAA 

1051 GGCAAAAGGC TACCTTGAAG CGAGCATTGC ATTAAAGCCG AGTATTTCCG 

1101 CGCGTTTGGT TCTAACAAAG GTTTTCGACG AAATCGGAGA AC CG CAGAAG 

1151 GCGGAGGCGC AC. . . 

This corresponds to the amino acid sequence <SEQ ID 750; ORFIOO: 

1 MKTWWIWL FAAAVGLALA SGIYTGDVYI VLGQTMLRIN LHAFVLGSLI 

51 AVWWYFLFK FIIGVLNIPE KMQRFG3ARK GXKXXLALNK AGLAYFEGRF 

101 EKAELEASRV LVNKVGRDNR TLALMLXAHA AGQMENIXXR DRYLAEIAKL 

151 PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMNANLT RLVRLXIRYA 

201 FDRGDALQVL AKTEKLSKAG ALGKSEMERY QNWAYRRQLA DAADAAALKT 

251 CLKRIPDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP XNRRPELLEA 

301 FVESVRFLGE REQQKAIDFA DAWLKEQPDN ALLLMYLGRL AFGRKLWGKA 

351 KGYLEASIAL KPSISARLVL TKVFDEIGEP QKAEAH. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 75 1>: 

1 ATGAAAACGG TAGTCTGGAT TGTCGTCCTG TTTGCCGCCG CCGTCGGACT 

51 GGCGCTGGCT TCGGGCATTT ACACCGGCGA CGTGTATATC GTACTCGGAC 

101 AGACCATGCT CAGAATCAAC CTGCACGCCT TTGTGTTAGG TTCGCTGATT 

151 GCCGTCGTGG TGTGGTATTT CTTGTTTAAA TTCATTATCG GCGTACTCAA 
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751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 



TATCCCCGAA 
CCGCGCTTGC 
GAAAAGGCGG 
AGACAACCGG 
TGGAAAACAT 
CCGGAAAAAC 
GAACCGGCGC 
AGATGAATGC 
TTCGACAGGG 
CAAGGCGGGC 
CATACCGCCG 
TGCCTGAAGC 
GGTTGCGGAA 
GGGTCAAACA 
TTTGTCGAAA 
CGATTTTGCC 
TGATGTATCT 
AAAGGCTACC 
TTTGGTTCTA 
AGGCGCAGCG 
GCAGCGTTAG 



AAGATGCAGC 
CTTGAACAAG 
AACTAGAAGC 
ACTTTGGCAT 
CGAGCTGCGC 
AGCAGCTTTC 
GATTACGAAG 
CAACCTTACG 
GCGACGCGTT 
GCGTTGGGCA 
CCAGCTGGCG 
GGATTCCCGA 
AAGTACGAAC 
GCATTATCCG 
GCGTGCGCTT 
GATGCTTGGC 
CGGTCGGCTC 
TTGAAGCGAG 
GCAAAGGTTT 
CAACTTGGTT 
AGCAGCATAG 



GTTTCGGTTC 
GCGGGTTTGG 
CTCACGCGTG 
TGATGCTGGG 
GACCGTTATC 
CCGTTATCTT 
CGGCGGAAGC 
CGCCTCGTGC 
GCAGGTTCTG 
AATCGGAAAT 
GATGCTGCCG 
CAGCCTCAAA 
GTTTGGGACT 
CACAACCGCC 
TTTGGGCGAG 
TGAAAGAACA 
GCCTACGGCC 
CATTGCATTA 
TCGACGAAAT 
TTGGAAGCCG 
CTGA 



GGCGCGTAAA 
CGTATTTTGA 
TTGGTCAACA 
CGCGCACGCC 
TTGCGGAAAT 
TTGTTGGCGG 
CAATCTTCAT 
GTCTGCAACT 
GCAAAAACCG 
GGAACGGTAT 
ATGCCGCCGC 
AACGGGGAAT 
GTATGCCGAT 
GCCCCGAGCT 
CGCGAACAGC 
GCCCGATAAC 
GCAAACTTTG 
AAGCCGAGTA 
CGGAGAACCG 
TCTCCGATGA 



GGCCGCAAGG 
AGGGCGTTTT 
AAGAGGCCGG 
GCCGGACAGA 
CGCCAAACTG 
AATCGGCGTT 
GCGGCGGCGA 
TCGTTACGCT 
AAAAACTTTC 
CAAAATTGGG 
TTTGAAAACC 
TGAGCGTATC 
GCGGTCAAAT 
TTTGGAAGCC 
AGAAAGCCAT 
GCGCTTCTGC 
GGGCAAGGCA 
TTTCCGCGCG 
CAGAAGGCGG 
CGAACGTCAC 



This corresponds to the amino acid sequence <SEQ ID 752; ORF100-1>; 



101 EKAELEASRV LVNKEAGDNR TLALMLGAHA AGQMENIELR DRYLAEIAKL 
151 PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMNANLT RLVRLQLRYA 
201 FDRGDALQVL AKTEKLSKAG ALGKSEMERY QNWAYRRQLA DAADAAALKT 
251 CLKRIPDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP HNRRPELLEA 
301 FVESVRFLGE REQQKAIDFA DAWLKEQPDN ALLLMYLGRL AYGRKLWGKA 
351 KGYLEASIAL KPSISARLVL AKVFDEIGEP QKAEAQRNLV LEAVSDDERH 
401 AALEQHS* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF100 shows 93.5% identity over a 386aa overlap with an ORF (ORFlOOa) from strain A of TV. 
meningitidis: 



orflOO.pep 
orflOOa 



MKTWWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVWWYFLFK 
MKTWWIWLFAAAXGLALASGIXTGDVYIVLGQTMLRINLHAFVLGSLIAWVWYFLFK 



70 80 90 100 110 120 

orf 100 . pep FIIGVLNIPEKMQRFGSARKGXKXXLALNKAGLAYFEGRFEKAELEASRVLVNKVGRDNR 

MM ! M I MM M M : Ml 

orf 100a FIIGVLNXPEKMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf 100 . pep TLALMLXAHAAGQMENIXXRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 

orf 100a TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 
130 140 150 160 170 180 

190 200 210 220 230 240 

orf 100 . pep AAAKMNANLTRLVRLXIRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQLA 

M M M M : M M I MM I M M M M M I 

orf 100a AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKXSKAGAXGKSEMERYQNWAYRRQLX 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 100 . pep DAADAAALKTCLKRIPD5LKNGELSVSVAEKYERLGLYADAVKWVKQHYPXNRRPELLEA 
M M M M M M M M M M M M M I M M M M M M M M M M M I M M M M I 
orf 100a DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 

250 260 270 280 290 300 
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FVESVRFLGERDQQKAIDFADAW1KEQPDNALLLXYLGRLAYGRKLWGKAKGYLEASIAL 



KPSI SARLVLTKVFDE IGE PQKAEAH 
II : I I I I I I I I I I I I I : 

KPSISARLVLAKVFDETGEPQKAEAQRNLVLASVAEENRPSAETHX 
370 380 390 400 



The complete length ORFlOOa nucleotide sequence <SEQ ID 753> is 



1001 
1051 
1101 
1151 
1201 



ATGAAAACGG 
GGCATTGGCG 
AGACCATGCT 
GCCGTCGTGG 
TANCCCCGAA 
CCGCGCTTGC 
GAAAAGGCGG 
GGATAACCGG 
TGGAAAACAT 
CCGGAAAAGC 
GAACCGGCGC 
AGATGAATGC 
TTCGACAGGG 
CAAGGCGGGC 
CATACCGCCG 
TGCCTGAAGC 
GGTTGCGGAA 
GGGTCAAACA 
TTTGTCGAAA 
CGATTTTGCC 
TGANGTATCT 
AAAGGCTACC 
TTTGGTTCTG 
AGGCGCAGCG 
TCCGCCGAAA 



TAGTCTGGAT 
TCGGGCATTN 
CAGAATCAAC 
TGTGGTATTT 
AAGATGCAGC 
TTTGAACAAG 
AACTTGAAGC 
ACTTTGGCAT 
CGAGCTGCGC 
AGCAGCTTTC 
GATTACGAAG 
CAACCTTACG 
GCGACGCGTT 
GCGTNGGGCA 
CCAGCTGNCG 
GGATTCCCGA 
AAGTACGAAC 
GCATTATCCG 
GCGTGCGCTT 
GATGCTTGGC 
CGGTCGGCTC 
TTGAAGCGAG 
GCAAAGGTTT 
CAACTTGGTT 
CCCATTGA 



TGTCGTCCTG 
ACACCGGCGA 
CTGCACGCCT 
CCTGTTCAAA 
GTTTCGGTTC 
GCGGGTTTGG 
CTCGCGCGTA 
TGATGTTGGG 
GACCGTTATC 
CCGTTATCTT 
CGGCGGAAGC 
CGCCTCGTGC 
GCAGGTTCTG 
AATCGGAAAT 
GATGCTGCCG 
CAGCCTCAAA 
GTTTGGGACT 
CACAACCGCC 
TTTGGGCGAA 
TGAAAGAACA 
GCCTACGGCC 
CATTGCATTA 
TTGACGAAAC 
TTGGCAAGCG 



TTTGCCGCCG 
CGTGTATATC 
TTGTGTTAGG 
TTCATCATCG 
GGCGCGTAAA 
CGTATTTTGA 
TTGGGAAACA 
CGCACATGCC 
TTGCGGAAAT 
TTGTTGGCGG 
CAATCTTCAT 
GTCTGCAACT 
GCAAAAACCG 
GGAACGGTAT 
ATGCCGCCGC 
AACGGGGAAT 
GTATGCCGAT 
GACCCGAACT 
CGCGATCAGC 
GCCCGATAAT 
GCAAACTTTG 
AAGCCGAGTA 
CGGAGAACCG 
TTGCCGAGGA 



CNNTCGGGCT 
GTACTCGGAC 
TTCGCTGATT 
GCGTACTCAA 
GGCCGCAAGG 
AGGGCGTTTT 
AAGAGGCGGG 
GCCGGGCAGA 
CGCCAAACTG 
AATCGGCGTT 
GCGGCGGCGA 
TCGTTACGCT 
AAAAANTTTC 
CAAAATTGGG 
TTTGAAAACC 
TGAGCGTATC 
GCGGTCAAAT 
TTTGGAAGCN 
AGAAAGCCAT 
GCGCTTCTGC 
GGGCAAGGCA 
TTTCCGCGCG 
CAGAAGGCGG 
AAACCGNCCT 



This encodes a protein having amino acid sequence <SEQ ID 754>: 

1 MKTWWIWL FAAAXGLALA SGIXTGDVYI VLGQTMLRIN LHAFVLGSLI 

51 AVWWYFLFK FIIGV LNXPE KMQRFGSARK GRKAALALNK AGLAYFEGRF 

101 EKAELEASRV LGNKEAGDNR TLALMLGAHA AGQMENIELR DRYLAEIAKL 

151 PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMNANLT RLVRLQLRYA 

201 FDRGDALQVL AKTEKXSKAG AXGKSEMERY QNWAYRRQLX DAADAAALKT 

251 CLKRIPDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP HNRRPELLEA 

301 FVESVRFLGE RDQQKAIDFA DAWLKEQPDN ALLLXYLGRL AYGRKLWGKA 

351 KGYLEASIAL KPSISARLVL AKVFDETGEP QKAEAQRNLV LASVAEENRP 

401 SAETH* 

ORFlOOa and ORF 100-1 show 95.1% identity in 406 aa overlap: 



MKTWWIWLFAAAXGLALASGIXTGDVYIVLGQTMLRINLHAFVLGSLIAWVWYFLFK 

I I I I I I I I I i I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I ! I 

MKTVVWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVWWYFLFK 



orflOOa.pep 



FIIGVLNXPEKMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 

I I I I I I I 1 I I I 1 I I I I I I I I I I I Ill I I I I I I I I I II I I I I I I I I 

FIIGVLNIPEKMQRFGSAP.KGRKAALALNKAGLAYFEGRFEKAELEASRVLVNKEAGDNR 
70 80 90 100 110 120 

130 140 150 160 170 180 

TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 

I I I I I I I I II I I I I I I I I II 

TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 
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130 



140 



150 



160 



170 



180 



190 200 210 220 230 240 

AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKXSKAGAXGKSEMERYQNWAYRRQLX 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQLA 

190 200 210 220 230 240 



310 320 330 340 350 360 

FVESVRFLGERDQQKAIDFADAWLKEQPDNALLLXYLGRLAYGRKLWGKAKGYLEASIAL 

I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I 

FVESVRFLGEREQQKAIDFADAWLKEQPDNALLLMYLGRLAYGRKLWGKAKGYLEASIAL 

310 320 330 340 350 360 



370 



380 



390 



4 0C 



orf 100a. pep KPSISARLVLAKVFDETGEPQKAEAQRNLVLASVAEENRPSA-ETHX 
or f 100-1 KPSISARLVLAKVFDEIGEPQKAEAQRNLVLEAVSDDERHAALEQHSX 



Homology with a predicted ORF from N.sonorrhoeae 

ORF100 shows 93.3% identity over a 386 aa overlap with a predicted ORF (ORFlOOng) from 
N. gonorrhoeae: 

orf 100. pep MKTWWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVWWYFLFK 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orflOOng MKTWWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVWWYFLFK 60 

orf 100 . pep FIIGVLNIPEKMQRFGSARKGXKXXLALNKAGLAYFEGRFEKAELEASRVLVNKVGRDNR 120 

orflOOng FIIGVLNIPENMRRSGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 120 

orf 100. pep TLALMLXAHAAGQMENIXXRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 180 

orflOOng T LALMLGAHAAGQMEN I E LRDR YLAE I AKLPEKQQL S RYLLLAE SALNRRDYEAAEAN LH 180 

orf 100 . pep AAAKMNANLTRLVRLXIRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQLA 24 0 

orflOOng AAAKMNANLTRLVRLQLRYAFDRC-DALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQMA 240 

orf 100 . pep DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPXNRRPELLEA 300 

orflOOng DAADAAALKTCLKRIPDSLKNGSLSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 300 

orf 100 . pep FVESVRFLGEREQQKAIDFADAWLKEQPDNALLLMYLGRLAFGRKLWGKAKGYLEASIAL 360 

orflOOng FVESVRFLGEREQQKAIDFADSWLKEQPDNALLLMYLGRLAYGRKLWGKAKGYLEASIAL 360 



orflOO.pep 
orflOOng 



KPS I SARLVLTKVFDE IGEPQKAEAH 

KPS I PARLVLAKVFDETAQSQKAEAQRNLVLASVAGENRPSAETR 



The complete length ORFlOOng nucleotide sequence <SEQ ID 755> i: 



ATGAAAACGG 
GGCGCTGGCT 
AGACCATGCT 
GCCGTCGTGG 
TATCCCCGAA 
CCGCGCTTGC 
GAAAAGGCGG 
AGACAACCGG 
TGGAAAATAT 



TAGTCTGGAT 
TCGGGCATTT 
CAGAATCAAC 
TGTGGTATTT 
AATATGCGGC 
CTTGAATAAG 
AACTCGAAGC 
ACTTTGGCAT 
CGAGCTGCGC 



TGTTGTCCTG 
ACACCGGCGA 
CTGCACGCCT 
CCTGTTTAAA 
GTTCCGGTTC 
GCGGGTTTGG 
CTCTCGAGTG 
TGATGCTGGG 
GACCGTTATC 



TTTGCCGCCG 
CGTGTATATC 
TTGTGTTAGG 
TTCATCATCG 
GGCGCGGAAA 
CGTATTTCGA 
TTGGGCAACA 
CGCGCACGCG 
TTGCGGAAAT 



CCGTCGGACT 
GTACTCGGAC 
TTCGCTGATT 
GCGTACTCAA 
GGCCGCAAGG 
AGGGCGTTTT 
AAGAGGCCGG 
GCAGGACAGA 
CGCCAAACTG 
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4 51 CCGGAAAAAC AGCAGCTTTC CCGCTATCTT CTGCTGGCGG AATCGGCGTT 

501 AAACCGGCGC GATTACGAAG CGGCGGAAGC CAATCTTCAT GCGGCGGCGA 

551 AGATGAATGC CAACCTTACG CGCCTCGTGC GTCTGCAACT TCGTTACGCC 

601 TTCGATCGGG GCGATGCGTT GCAGGTTCTG GCAAAAaccG AAAAACTTTC 

651 CAAGGCGGGC GCGTTGGGCA AATCGGAAAT GGAACGGTAT CAAAATTGGG 

7 01 CATACCGCCG CCAGATGGCG GATGCTGCCG ATGCCGCCGC TTTGAAAACC 
751 TGCCTGAAGC GGATTCCCGA CAGCCTCAAA AACGGGGAAT TGagcGTATC 

8 01 GGTTGCGGAA AAGTACGAAC GTTTGGGACT GTATGCCGAT GCGGTCAAAT 
851 GGGTCAAACA GCATTATCCG CACAACCGCC GCCCCGAGCT TTTGGAAGCC 
901 TTTGTCGAAA GCGTGCGCTT TTTGGGCGAG CGCGAACAGC AGAAAGC CAT 
951 CGATTTTGCC GATTCTTGGC TGAAAGAACA GCCCGATAAC GCGCTTCTGC 

1001 TGATGTATCT CGGCCGGCTC GCCTACGGCC GCAAACTTTG GGGTAAGGCA 

1051 AAAGGCTACC TTGAAGCGAG TATTGCACTG AAGCCGAGTA TTCCGGCGCG 

1101 TTTGGTGTTG GCAAAGGTTT TTGACGAAAC CGCACAGTCG CAAAAAGCCG 

1151 AAGCACAGCG CAACTTGGTT TTGGCAAGCG TTGCCGGGGA AAACCGCCCT 

12 01 TCCGCCGAAA CCCGTTGA 

This encodes a protein having amino acid sequence <SEQ ID 756>: 

1 MKTWWIWL FAAAVGLALA SGIYTGDVYI VLGQTMLRIN LHAFVLGSLI 

51 AVWWYFLFK FIIGV LNIPE NMRRSGSARK GRKAALALNK AGLAYFEGRF 

101 EKAELEASRV LGNKEAGDNR TLALMLGAHA AGQMENIELR DRYLAEIAKL 

151 PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMNANLT RLVRLQLRYA 

2 01 FDRGDALQVL AKTEKLSKAG ALGKSEMERY QNWAYRRQMA DAADAAALKT 

251 CLKRIPDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP HNRRPELLEA 

301 FVESVRFLGE REQQKAIDFA DSWLKEQPDN ALLLMYLGRL AYGRKLWGKA 

351 KGYLEASIAL KPSIPARLVL AKVFDETAQS QKAEAQRNLV LASVAGENRP 

4 01 SAETR* 

ORFlOOng and ORF100-1 show 95.3% identity in 402 aa overlap: 

10 20 30 40 50 60 

orf 100-1 . pep MKTWWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAWVWYFLFK 

orflOOng MKTWWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAWVWYFLFK 
10 20 30 40 50 60 



70 80 90 100 110 120 

orf 100-1 . pep FIIGVLNIPEKMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLVNKEAGDNR 

orflOOng FIIGVLNIPENMRRSGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 
70 80 90 100 110 120 



130 140 150 160 170 180 

orf 100-1 . pep TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orflOOng TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 100-1 . pep AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQLA 

orflOOng AAAKMNMJLTRLVRLQLRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQMA 
190 200 210 220 230 240 



orf 100-1. pep FVESVRFLGEREQQKAIDFADAWLKEQPDNALLLMYLGRLAYGRKLWGKAKGYLEASIAL 
Mil Mill: Illlllll III! 



orf 100-1 .pep KPSISARLVLAKVFDEIGEPQKAEAQRNLVLEAVSDDERHAALEQHSX 
orflOOn KPSIPARLVLAKVFDETAQSQKAEAQRNLVLASVAGENRPSAETRX 
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370 380 390 400 

Based on this analysis, including the presence of a putative leader sequence, a putative 
transmembrane domain, and a RGD motif, it is predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 90 

The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ID 
757> 



1 ATGATGTTTT CTTGGTTCAA GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 

51 GTTTGCAGGG CTGTTTTACC TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 

101 TTGATGTGCC GCGCGGCAAT CCCGAGTATG TGCGTCTGTC GGGCATGGCG 

151 GTGCGGCTGT ACCGTTTTAT GTCGCCGTTG GGCTTCGGCG CGGTCGTGTT 

201 CGGCGCGGCG ATACCGTTTG CCGCCGGCTG GTGGGGCAGC GGCTGGGTAC 

251 ACGTCAAACT GTGTTTGGGC TTGATGCTCT TGGCTTACCA GTTGTATTGC 

301 GGCGTGCTGC TGCGCCGTTT TCAGGATTAC AGCAATGCTT TTTCACACCG 

351 CTGGTACCGC GTGTTCAACG AAATCCCCGT GCTGCTGATG GTTGCCGCGC 

401 TGTATsTGGT CGTGTTCAAA CCGTTTTGA 

This corresponds to the amino acid sequence <SEQ ID 758; ORF102>: 

1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVN MAMIDVPRGN PEYVRLSGMA 
51 VRLYRFMSPL GFGAWFGAA IPFAAGWWGS GWVHVKLCLG LMLLAYQLYC 
101 GVLLRRFQDY SNAFSHRWYR VFNEIPVLLM VAALYXWFK PF* 

Further work revealed the complete nucleotide sequence <SEQ ID 759>: 

1 ATGATGTTTT CTTGGTTCAA GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 

51 GTTTGCAGGG CTGTTTTACC TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 

101 TTGATGTGCC GCGCGGCAAT CCCGAGTATG TGCGTCTGTC GGGCATGGCG 

151 GTGCGGCTGT ACCGTTTTAT GTCGCCGTTG GGCTTCGGCG CGGTCGTGTT 

201 CGGCGCGGCG ATACCGTTTG CCGCCGGCTG GTGGGGCAGC GGCTGGGTAC 

251 ACGTCAAACT GTGTTTGGGC TTGATGCTCT TGGCTTACCA GTTGTATTGC 

301 GGCGTGCTGC TGCGCCGTTT TCAGGATTAC AGCAATGCTT TTTCACACCG 

351 CTGGTACCGC GTGTTCAACG AAATCCCCGT GCTGCTGATG GTTGCCGCGC 

401 TGTATCTGGT CGTGTTCAAA CCGTTTTGA 

This corresponds to the amino acid sequence <SEQ ID 760; ORF102-1>: 

1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVN MA MIDVPRGN PEYVRLSGMA 
51 VRLYRFMSP L GFGAWFGAA IPFAAG WWGS GWVHVK LCLG LMLLAYQLYC 
101 GVLLRRFQDY SNAFSHRWYR VFNE IPVLLM VAALYLWFK P F* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with HP1484 hypothetical integral membrane protein of H. pylori (accession number AE000647) 
ORF102 and HP1484 show 33% aa identity in 143aa overlap: 

orfl02 3 FSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPLGF 62 

F W K FH+ VISW A LFYLPR+FV A + V++ +LY F++ 

HP1484 8 FLWVKAFHVIAVISWMAALFYLPRLFVYHAENAHKKEFVGWQIQEK— KLYSFIASPAM 65 

orfl02 63 GAWFGAAI PFAAG WWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWY 119 

G + + + GW+H KL L ++LLAY YC +R + + R+Y 

HP14 84 66 GFTLITGILMLLIEPTLFKSGGWLHAKLALWLLLAYHFYCKKCMRELEKDPTRRNARFY 125 

orfl02 120 RVFNEIPXXXXXXXXXXXXFKPF 142 

RVFNE P KPF 
HP1484 126 RVFNE APT I LMI L I VI LVWKPF 148 
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Homology with a predicted ORF from N. meningitidis (strain A) 

ORF102 shows 99.3% identity over a 142aa overlap with an ORF (ORF102a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orfl02.pep MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 

III I I I I I I I I I I I I I I I I I I 

or f 1 0 2 a MMFSWFKLFHLFFVI SWFAGLFYLPRI FVNMAMI DVPRGNPE YVRLSGMAVRLYRFMS PL 

10 20 30 40 50 60 

70 80 90 100 110 120 

orfl02.pep GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 

o r f 1 0 2 a GFGAWFGAAIPFAAGWWG S GWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 

70 80 90 100 110 120 



VFNEI PVLLMVAALYXVVFKPFX 



VFNEIPVLLMVAALYLWFKPFX 



The complete length ORF 102a nucleotide sequence <SEQ ID 761 > is: 



ATGATGTTTT CTTGGTTCAA 
GTTTGCAGGG CTGTTTTACC 
TTGATGTGCC GCGCGGCAAT 
GTGCGGCTGT ACCGTTTTAT 
CGGCGCGGCG ATACCGTTTG 
ACGTCAAACT GTGTTTGGGC 
GGCGTGCTGC TGCGCCGTTT 
CTGGTACCGC GTGTTCAACG 
TGTATCTGGT CGTGTTCAAA 



GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 
TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 
CCCGAGTATG TGCGTCTGTC GGGCATGGCG 
GTCGCCGTTG GGCTTCGGCG CGGTCGTGTT 
CCGCCGGCTG GTGGGGCAGC GGCTGGGTAC 
TTGATGCTCT TGGCTTACCA GTTGTATTGC 
TCAGGATTAC AGCAATGCTT TTTCACACCG 
AAATCCCCGT GCTGCTGATG GTTGCCGCGC 
CCGTTTTGA 



This encodes a protein having amino acid sequence <SEQ ID 762>: 

1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVN MAMIDVPRGN PEYVRLSGMA 



ORF 102a and ORF 102-1 show complete identity in 142 aa overlap: 



orfl02a.pep 



MMFSWFKLFHLFFVI SWFAGLFYLPRI FVNMAMI DVPRGNPE YVRLSGMAVRLYRFMSPL 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
MMFSWFKLFHLFFVI SWFAGLFYLPRI FVNMAMI DVPRGNPE YVRLSGMAVRLYRFMSPL 



70 80 90 100 110 120 

orf 102a . pep GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! II I 

orf 102-1 GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 

70 80 90 100 110 120 



130 140 
orf 102a. pep VFNEIPVLLMVAALYLWFKPFX 

Ill I I I I I I I I I 

orfl02-l VFNEIPVLLMVAALYLWFKPFX 

130 140 



Homology with a predicted ORF from ^.gonorrhoeae 

ORF102 shows 97.9% identity over a 142 aa overlap with a predicted ORF (ORF102ng) from N. 
gonorrhoeae: 
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orf 102 . pep MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 

orfl02ng MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDAPRGNPEYVRLSGMAVRLYRFMSPL 

orf 102 . pep GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 

I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I 

orf!02ng GFGAVVFGAAIPFAAGRWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 



orf 102 .pep 
orf 102ng 



VFNE I PVLLMVAALYXWFKPF 
I I I I I I I I I I I I I I I I I I I I 
VFNE I PVLLMVAALYLWFKPF 



The complete length ORF102ng nucleotide sequence <SEQ ID 763> is: 



ATGATGTTTT CTTGGTTCAA 
GTTTGCAGGG CTGTTTTACC 
TTGATGCGCC GCGCGGCAAT 
GTGCGGTTGT ACCGTTTTAT 
CGGCGCGGCG ATACCGTTTG 
ACGTCAAACT GTGTTTGGGC 
GGCGTGCTGC TGCGCCGTTT 
CTGGTACCGC GTGTTCAAcg 
TGTATCTGGT CGTGTTCAAA 



GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 
TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 
CCCGAGTATG TGCGCCTGTC GGGGATGGCG 
GTCGCCTTTG GGTTTCGGCG CGGTCGTGTT 
CCGCcggccg GTGGGGCagc ggctggGTTC 
TTGATGCTCT TGGCTTATCA GTTGTATTGC 
TCAGGATTAC AGCAATGCTT TTTCACACCG 
aAATCCCCGT GCTGCTGATG GTTGCCGCGC 
CCGTTTTGA 



This encodes a protein having amino acid sequence <SEQ ID 764>: 



1 MMFSWFKLFH LFFVISWFAG LFYLFRIFVN MA MIDAPRGN PEYVRLSGMA 
51 VRLYRFMSP L GFGAWFGAA IPFAAG RWGS GWVHVK LCLG LMLLAYQLYC 
101 GVLLRRFQDY SNAFSHRWYR VFNE IPVLLM VAALYLWFK P F* 

ORF102ng and ORF102-1 show 98.6% identity in 142 aa overlap: 

10 20 30 40 50 60 

MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 
I I I I I I I II I I I I I I I I I I I I I I I I I I I I I ! I I I I : I I I I I I I I I I I I I I I I I I I I I I I I 
MMFSWFKLFHLFFVISWFAGLFYLPRI FVNMAMI DAPRGNPEYVRLSGMAVRLYRFMS PL 
10 20 30 40 50 60 



orfl02-l.pep 
orf 102ng 



70 80 90 100 110 120 

orf 102-1. pep GFGAVVFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I II I I I 
orfl02ng GFGAWFGAAIPFAAGRWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 

70 80 90 100 110 120 



130 140 
orf 102-1. pep VFNEIPVLLMVAALYLWFKPFX 

orfl02ng VFNEIPVLLMVAALYLWFKPFX 
130 140 

In addition, ORF102ng shows significant homology to a membrane protein from H. pylori: 

gi 1 2314656 (AE000647) conserved hypothetical integral membrane protein 
[Helicobacter pylori] Length = 148 
Score = 79.2 bits (192), Expect = le-14 

Identities = 50/147 (34%), Positives = 68/147 (46%), Gaps = 13/147 (8%) 



Query: 


3 


FSWFKLFHLFFVISWFAGLFYLPRI FVNMAMI DAPRGNPEYVRLSGMAVRLYRFMS PLGF 


62 






F W K FH+ VI SW A LFYLPR+FV A + V++ +LY F++ 




Sbjct: 


8 


FLWVKAFHVIAVISWMAALFYLPRLFVYHAENAHKKEFVGVVQIQEK— KLYSFIASPAM 


65 


Query: 


63 


GAWFGAAIP FAAGRWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFS 


115 






G + + F +G GW+H KL L ++LLAY YC +R + + 




Sbjct: 


66 


GFTLITGILMLLIEFTLFKSG GWLHAKLALWLLLAYHFYCKKCMRELEKDPTRRN 


121 




116 


HRWYRVFNEIPXXXXXXXXXXXXFKPF 142 








R+YRVFNE P KPF 




Sbjct: 


122 


ARFYRVFNEAPTILMILIVILWVKPF 148 
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Based on this analysis, it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 91 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 765>: 

1 ATGGCAAAAA TGATGAAATG GGCGGCTGTT GCGGCGGTCG CGGCGGCAGC 

51 GGTTTGGGGC GGATGGTCTT AACTGAAGCC CGAGCCGCAC GTGCTTGATA 

101 TTACGGAAAC GGTCAGGCGC GGC // 

//.. ATTTCGTTTA CGATTTTGTC CGAACCGGAT ACGCCGATTA AGGCGAAGCT 

51 CGACAGCGTC GACCCCGGGC TGACCACGAT GTCGTCGGGC GGTTACAACA 

101 GCAGTACGGA TACGGCTTCC AATGCGGTCT ACTATTATGC CCGTTCGTTT 

151 GTGCCGAATC CGGACGGCAA ACTCGCCACG GGGATGACGA CGCAGAATAC 

201 GGTTGAAATC GACGGCGTGA AAAATGTGCT GATTATTCCG TCGCTGACCG 

251 TGAAAAATCG CGGCGGCAAG GCGTTTGTGC GCGTGTTGGG TGCGGACGGC 

301 AAGGCGGCGG AACGCGAAAT CCGGACCGGT AT G AGAGACA GTATGAATAC 

351 CGAAGTAAAA AGCGGGTTGA AAGAGGGGGA CAAAGTGGTC ATCTCCGAAA 

4 01 TAACCGCCGC CGAGCAACAG GAAAGCGGCG AACGCGCCCT AGGCGGCCCG 

4 51 CCGCGCCGAT AA 

This corresponds to the amino acid sequence <SEQ ID 766; ORF85>: 

1 MAKMMKWAAV AAVAAA AVWG GWS.LKPEPH VLDITETVRR G 



201 I SFTIiSEPDT 

251 PIKAKLDSVD PGLTTMSSGG YNSSTDTASN AVYYYARSFV PNPDGKLATG 

301 MTTQNTVEID GVKNVLIIPS LTVKNRGGKA FVRVLGADGK AAEREIRTGM 

351 RDSMNTEVKS GLKEGDKWI SEITAAEQQE SGERALGGPP RR* 

Further work revealed the further partial nucleotide sequence <SEQ ID 767>: 



1 . . GTATCGGTCG GCGCGCAGGC ATCGGGGCAG ATTAAGATAC TTTATGTCAA 

51 ACTCGGGCAA CAGGTTAAAA AGGGCGATTT GATTGCGGAA ATCAATTCGA 

101 CCTCGCAGAC CAATACGCTC AATACGGAAA AATCCAAGTT GGAAACGTAT 

151 CAGGCGAAGC TGGTGTCGGC ACAGATTGCA TTGGGCAGCG CGGAGAAGAA 

201 ATATAAGCGT CAGGCGGCGT TATGGAAGGA AAACGCGACT TCCAAAGAGG 

251 ATTTGGAAAG CGCGCAGGAT GCGTTTGCCG CCGCCAAAGC CAATGTTGCC 

301 GAGCTGAAGG CTTTAATCAG ACAGAGCAAA ATTTCCATCA ATACCGCCGA 

351 GTCGGAATTG GGCTACACGC GCATTACCGC AACGATGGAC GGCACGGTGG 

401 TGGCGATTCT CGTGGAAGAG GGGCAGACTG TGAACGCGGC GCAGTCTACG 

451 CCGACGATTG TCCAATTGGC GAATCTGGAT ATGATGTTGA ACAAAAT G C A 

501 GATTGCCGAG GGCGATATTA CCAAGGTGAA GGCGGGGCAG GATATTTCGT 

551 TTACGATTTT GTCCGAACCG GATACGCCGA TTAAGGCGAA GCTCGACAGC 

601 GTCGACCCCG GGCTGACCAC GATGTCGTCG GGCGGTTACA ACAGCAGTAC 

651 GGATACGGCT TCCAATGCGG TCTACTATTA TGCCCGTTCG TTTGTGCCGA 

701 ATCCGGACGG CAAACTCGCC ACGGGGATGA CGACGCAGAA TACGGTTGAA 

751 ATCGACGGCG TGAAAAATGT GCTGATTATT CCGTCGCTGA CCGTGAAAAA 

801 TCGCGGCGGC AAGGCGTTTG TGCGCGTGTT GGGTGCGGAC GGCAAGGCGG 

851 CGGAACGCGA AATCCGGACC GGTATGAGAG ACAGTATGAA TACCGAAGTA 

901 AAAAGCGGGT TGAAAGAGGG GGACAAAGTG GTCATCTCCG AAATAACCGC 

951 CGCCGAGCAA CAGGAAAGCG GCGAACGCGC CCTAGGCGGC CCGCCGCGCC 

1001 GATAA 

This corresponds to the amino acid sequence <SEQ ID 768; ORF85-l>: 

1 ..VSVGAQASGQ IKILYVKLGQ QVKKGDLIAE INSTSQTNTL NTEKSKLETY 

51 QAKLVSAQIA LGSAEKKYKR .QAALWKENAT SKEDLESAQD AFAAAKANVA 

101 ELKALIRQSK ISINTAESEL GYTRITATMD GTWAILVEE GQTVNAAQST 

151 PTIVQLANLD MMLNKMQIAE GDITKVKAGQ DISFTILSEP DTPIKAKLDS 

201 VDPGLTTMSS GGYNSSTDTA SNAVYYYARS FVPNPDGKLA TGMTTQNTVE 

251 IDGVKNVLII PSLTVKNRGG KAFVRVLGAD GKAAEREIRT GMRDSMNTEV 

301 KSGLKEGDKV VISEITAAEQ QESGERALGG PPRR* 

Computer analysis of this amino acid sequence gave the following results: 
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Homology with a predicted ORF from N. meningitidis ("strain A) 

ORF85 shows 87.8% identity over a 41aa overlap and 99.3% identity over a 153aa overlap with 
an ORF (ORF 8 5 a) from strain A of//, meningitidis: 

10 20 30 40 

orf 85 . pep MAKMMKWAAVAAVAAAAVWGGWS-LKPEPHVLDITETVRRG 
I I I I I I I I I I I I I I I I I I I I I I I Mill:: I I j I I I I I 
orf 85a MAKMMKWAAVAAVAAAAVWGGWSYLKPEPQAAYITETVRRGDISRTVSATGEISPSNLVS 



orf 85. pep ISFTILSEPDTPIKAKLDSVDPGLTTMSSG 

I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 
orf 85a TIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSSG 



orf 85. pep GYNSSTDTASNAVYYyARSFVPNPDGKLATGMTTQNTVEIDGVKNVLIIPSLTVKNRGGK 
I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I II II I I I I I I I I I I I I I I I I I I : 
orf 85a GYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVKNVLIIPSLTVKNRGGR 



orf 85 . pep AFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGGP 

I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II II I 

orf 85a AFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGGP 
330 340 350 360 370 380 

230 

orf 85. pep PRRX 
I I I I 

orf85a PRRX 
390 

The complete length ORF85a nucleotide sequence <SEQ ID 769> is: 

1 ATGGCAAAAA TGATGAAATG GGCGGCTGTT GCGGCGGTCG CGGCGGCAGC 

51 GGTTTGGGGC GGATGGTCTT ATCTGAAGCC CGAGCCGCAG GCTGCTTATA 

101 TTACGGAAAC GGTCAGGCGC GGCGACATCA GCCGGACGGT TTCTGCAACA 

151 GGGGAGATTT CGCCGTCCAA CCTGGTATCG GTCGGCGCGC AGGCATCGGG 

201 GCAGATTAAG AAACTTTATG TCAAACTCGG GCAACAGGTT AAAAAGGGCG 

251 ATTTGATTGC GGAAATCAAT TCGACCTCGC AGACCAATAC GCTCAATACG 

301 GAAAAATCCA AATTGGAAAC GTATCAGGCG AAGCTGGTGT CGGCACAGAT 

351 TGCATTGGGC AGCGCGGAGA AGAAATATAA GCGTCAGGCG GCGTTGTGGA 

4 01 AGGATGATGC GACCGCTAAA GAAGATTTGG AAAGCGCACA GGATGCGCTT 

4 51 GCCGCCGCCA AAGCCAATGT TGCCGAGCTG AAGGCTCTAA TCAGACAGAG 

501 CAAAATTTCC ATCAATACCG CCGAGTCGGA ATTGGGCTAC ACGCGCATTA 

551 CCGCAACGAT GGACGGCACG GTGGTGGCGA TTCTCGTGGA AGAGGGGCAG 

601 ACTGTGAACG CGGCGCAGTC TACGCCGACG ATTGTCCAAT TGGCGAATCT 

651 GGATATGATG TTGAACAAAA TGCAGATTGC CGAGGGCGAT ATTACCAAGG 

701 TGAAGGCGGG GCAGGATATT TCGTTTACGA TTTTGTCCGA ACCGGATACG 

751 CCGATTAAGG CGAAGCTCGA CAGCGTCGAC CCCGGGCTGA CCACGATGTC 

801 GTCGGGCGGC TACAACAGCA GTACGGATAC GGCTTCCAAT GCGGTCTACT 

851 ATTATGCCCG TTCGTTTGTG CCGAATCCGG ACGGCAAACT CGCCACGGGG 

901 ATGACGACGC AGAATACGGT TGAAATCGAC GGTGTGAAAA ATGTGCTGAT 

951 TATTCCGTCG CTGACCGTGA AAAATCGCGG CGGCAGGGCG TTTGTGCGCG 

1001 TGTTGGGTGC AGACGGCAAG GCGGCGGAAC GCGAAATCCG GACCGGTATG 

1051 AGAGACAGTA TGAATACCGA AGTAAAAAGC GGGTTGAAAG AGGGGGACAA 

1101 AGTGGTCATC TCCGAAATAA CCGCCGCCGA GCAGCAGGAA AGCGGCGAAC 

1151 GCGCCCTAGG CGGCCCGCCG CGCCGATAA 

This encodes a protein having amino acid sequence <SEQ ID 770>: 

1 MAKMMKWAAV AAVAAA AVWG GWSYLKPEPQ AAYITETVRR GDISRTVSAT 

51 GEISPSNLVS VGAQASGQIK KLYVKLGQQV KKGDLIAEIN STSQTNTLNT 

101 EKSKLETYQA KLVSAQIALG SAEKKYKRQA ALWKDDATAK EDLESAQDAL 

151 AAAKANVAEL KALIRQSKIS INTAESELGY TRITATMDGT WAILVEEGQ 

201 TVNAAQSTPT IVQLANLDMM LNKMQIAEGD ITKVKAGQDI SFTILSEPDT 

251 PIKAKLDSVD PGLTTMSSGG YNSSTDTASN AVYYYARSFV PNPDGKLATG 

301 MTTQNTVEID GVKNVLIIPS LTVKNRGGRA FVRVLGADGK AAEREIRTGM 
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351 RDSMNTEVKS GLKEGDKWI SEITAAEQQE SGERALGGPP RR* 

ORF85a and ORF85-1 show 98.2% identity in 334 aa overlap: 

30 40 50 60 70 80 

or f 85a . pep PQAAYITETVRRGDISRTVSATGEISPSNLVSVGAQASGQIKKLYVKLGQQVKKGDLIAE 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
orf85-l VSVGAQASGQIKILYVKLGQQVKKGDLIAE 

10 20 30 



90 100 110 120 130 140 

or f 85a. pep INSTSQTNTLNTEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKDDATAKEDLESAQD 

orf85-l INSTSQTNTLNTEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKENATSKEDLESAQD 
40 50 60 70 80 90 



orf85a.pep ALAABJCANVAELKALIRQSKISINTAESELGYTRITATMDGTWAILVEEGQTVNAAQST 



orf85a.pep PTIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS 



270 280 290 300 310 320 

or f 8 5a. pep GGYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEI DGVKNVLI IPSLTVKNRGG 

orf85-l GGYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVE I DGVKNVLI IPSLTVKNRGG 

220 230 240 250 260 270 

330 340 350 360 370 380 

orf85a.pep RAFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGG 
: I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I M I I I I I I I I I I I I I I I I I II I I I I I I 
orf85-l KAFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGG 
280 290 300 310 320 330 



390 

orf85a.pep PPRRX 
I I I I I 

orf85-l PPRRX 

Figure 19D shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF85a.. 



Homology with a predicted ORF from N. gonorrhoeae 

ORF85 shows a high degree of identity with a predicted ORF (ORF85ng) from N. gonorrhoeae: 

ORF85 1 MAKMMKWAAVAAVAAAAVWGGWS . LKPEPHVLDITETVRRG 40 

ORF85ng 1 MAKMMKWAAVAAVAAAAVWGGWSYLKPEPQAAYITEAVRRGDISRTVSAT 50 



ORF85 






250 


ORF8 5ng 


201 


TVNAAQSTPTIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDT 


250 


ORF85 


251 


PIKAKLDSVDPGLTTMSSGGYNSSTDTASNAVYYYARSFVPNPDGKLATG 


300 


ORF85ng 


251 


PIKAKLDSVDPGLTTMS SGGYNSSTDTASNAVYYYARSFVPNPDGKLATG 


300 


ORF8 5 


301 


MTTQNTVEI DGVKNVLI IPSLTVKNRGGKAFVRVLGADGKAAEREIRTGM 


350 


ORF85ng 


301 


MTTQNTVE I DGVKNVLLI PSLTVKNRGGKAFVRVLGADGKAVERE IRTGM 


350 


ORF85 


152 


RDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGGPPRR 3 93 




ORF85ng 


■ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 : 1 1 1 1 1 1 1 ! 1 

351 KDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGGPPRR 393 
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The complete length ORF85ng nucleotide sequence <SEQ ID 771 > is: 

1 ATGGCAAAAA TGATGAAATG GGCGGCTGTT GCGGCGGTCG CGGCGGCaac 

51 GGTTTGGGGC GGATGGTCTT ATCTGAAGCC CGAACCGCAG GCTGCTTATA 

101 TTACGGAaac ggTCAGGCGC GGCGATATCA GCCGGACGGT TTCCGCGACG 

151 GgcgAGATTT CGCCGTCCAA CCTGGTATCG GTCGGCGCGC AGGCTTCGGG 

201 GCAGATTAAA AAGCTTTATG TCAAACTCGG GCAACAGGTC AAAAAGGGCG 

251 ATTTGATTGC GGAAATCAAT TCGACCACGC AGACCAACAC GATCGATATG 

301 GAAAAATCCA AATTGGAAAC GTATCAGGCG AAGCTGGTGT CGGCACAGAT 

351 TGCATTGGGC AGCGCGGAGA AGAAATATAA GCGTCAGGCG GCGTTGTGGA 

4 01 AGGATGATGC GACCTCTAAA GAAGATTTGG AAAGCGCGCA GGATGCGCTT 

451 GCCGCCGCCA AAGCCAATGT TGCCGAGTTG AAGGCTTTAA TCAGACAGAG 

501 CAAAATTTCC AT CAATAC CG CCGAGTCGGA TTTGGGCTAC ACGCGCATTA 

551 CCGCGACGAT GGACGGCACG GTGGTGGCGA TTCCCGTGGA AGAGGGGCAG 

601 ACTGTGAACG CGGCGCAGTC TACGCCGACG ATTGTCCAAT TGGCGAATCT 

651 GGATATGATG TTGAACAAAA TGCAGATTGC CGAGGGCGAT ATTACCAAGG 

701 TGAAGGCGGG GCAGGATATT TCGTTTACGA TTTTGTCCGA ACCGGATACG 

751 CCGATTAAGG CGAAGCTCGA CAGCGTCGAC CCCGGGCTGA CCACGATGTC 

801 GTCGGGCGGC TACAACAGCA GTACGGATAC GGCTTCCAAT GCGGTCTATT 

851 ATTATGCCCG TTCGTTTGTG CCGAATCCGG ACGGCAAACT CGCCACGGGG 

901 ATGACGACGC AGAATACGGT TGAAATCGAC GGTGTGAAAA ATGTGTTGCT 

951 TATTCCGTCG CTGACCGTGA AAAATCGCGG CGGCAAGGCG TTCGTACGCG 

1001 TGTTGGGTGC GGACGGCAAG GCAGTGGAAC GCGAAATCCG GACCGGTATG 

1051 AAAGACAGTA TGAATACCGA AGTGAAAAGC GGGTTGAAAG AGGGGGACAA 

1101 AGTGGTCATC TCCGAAATAA CCGCCGCCGA GCAGCAGGAA AGCGGCGAAC 

1151 GCGCCCTAGG CGGCCCGCCG CGCCGATAA 

This encodes a protein having amino acid sequence <SEQ ID 772>: 

1 MAKMMKWAAV AAVAAA AVWG GWSYLKPEPQ AAYITEAVR R GD ISRTVSAT 

51 GF.TSPSNLVS VGAQASGQIK KLYVKLGQQV KKGDLIAEIN STTQTNTIDM 

101 EKSKLETYQA KLVSAQIALG SAEKKYKRQA ALWKDDATSK EDLESAQDAL 

151 AAAKANVAEL KALIRQSKIS INTAESDLGY TRITATMDGT WAIPVEEGQ 

201 TVNAAQSTPT IVQLANLDMM LNKMQIAEGD ITKVKAGQDI SFTILSEPDT 

251 PIKAKLDSVD PGLTTMSSGG YNSSTDTASN AVYYYARSFV PNPDGKLATG 

301 MTTQNTVEID GVKNVLLIPS LTVKNRGGKA FVRVLGADGK AVEREIRTGM 

351 KDSMNTEVKS GLKEGDKWI SEITAAEQQE SGERALGGPP RR* 

ORF85ng and ORF85-1 show 96.1% identity in 334 aa overlap: 

30 40 50 60 70 80 

orf85ng PQAAYITETVRRGDISRTVSATGEISPSNLVSVGAQASGQIKKLYVKLGQQVKKGDLIAE 

I I I I I I I 1 I I I I I I I I I ! I 

orf85-l VSVGAQASGQIKILYVKLGQQVKKGDLIAE 

10 20 30 



90 100 110 120 130 140 

orf 8 5ng INSTTQTNTIDMEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKDDATSKEDLESAQD 

orf85-l INSTSQTNTLNTEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKENATSKEDLESAQD 
40 50 60 70 80 90 



150 160 170 180 190 200 

orf 8 5ng ALAAAKANVAELKALIRQSKISINTAE3DLGYTRITATMDGTWAIPVEEGQTVNAAQST 

I : I I I I I I I I I I I I I I I I II 111:111 I I I I I I I I I I I I I I I 

orf 85-1 AFAAAKANVAELKALIRQSKISINTAE3ELGYTRITATMDGTWAILVEEGQTVNAAQST 
100 110 120 130 140 150 

210 220 230 240 250 260 

orf85ng PTIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 
orf 85-1 PTIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS 
160 170 180 190 200 210 

270 280 290 300 310 320 

or f 8 5ng GGYNSST DTASNAVYYYARSFVPNPDGKLATGMTTQNTVE I DGVKNVLLI PSLTVKNRGG 

orf 85-1 GGYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVKNVLI I PSLTVKNRGG 

220 230 240 250 260 270 



330 340 350 360 370 380 
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orf 85ng 
orf85-l 



KAFVRVLGADGKAVEREIRTGMKDSMNTEVKSGLKEGDKVVISEITAAEQQESGERALGG 



>rf85ng 
>rf85-l 



PPRRX 
PPRRX 



1 0 In addition, ORF85ng shows significant homology to an E. coli membrane fusion protein: 

gi | 1787104 (AE000189) o380; 27% identical (27 gaps) to 332 residues from 
membrane fusion protein precursor, MTRC_NEIGO SW: P43505 (412 aa) [Escherichia 
coli] Length = 380 
Score = 193 bits (485), Expect = 2e-48 
15 Identities = 120/345 (34%), Positives = 182/345 (51%), Gaps = 13/345 (3%) 

PQAAYITETVRRGDISRTVSATGEISPSNLVSVGAQASGQIKKLYVKLGQQVKKGDLIAE 8 8 
P Y T VR GD+ ++V ATG++ V VGAQ SGQ+K L V +G +VKK L+ 

PVPTYQTLIVRPGDLQQSVLATGKLDALRKVDVGAQVSGQLKTLSVAIGDKVKKDQLLGV 100 

INSTTQTNTIDMEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKDDATSKEXXXXXXX 14 8 
1+ N I ++ L +A+ A+ L A Y RQ L + A S++ 

IDPEQAENQIKEVEATLMELRAQRQQAEAELKLARVTYSRQQRLAQTKAVSQQDLDTAAT 160 

XXXXXXXXXXXXXXXIRQSKISINTAESDLGYTRITATMDGTWAIPVEEGQTVNAAQST 208 
I++++ S++TA+++L YTRI A M G V I +GQTV AAQ 



P 1+ LA++ ML K Q++E D+ +K GQ FT+L +P T 





29 


Sbjct: 


41 




89 


Sbjct: 


101 




149 


Sbjct: 


161 




209 


Sbjct: 


221 




269 


Sbjct: 


274 




329 


Sbjct: 


329 



+V L +G+ ERE+ 



- E+ GL+ GD+VVI E 



40 Based on this analysis, it was predicted that the proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF85-1 (40.4kDa) was cloned in the pGex vectors and expressed in E.coli, as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figure 19A 
shows the results of affinity purification of the GST-fusion protein. Purified GST-fusion protein 
45 was used to immunise mice, whose sera were used for Western blot (Figure 19B), FACS analysis 
(Figure 19C), and ELISA (positive result). These experiments confirm that ORF85-1 is a 
surface-exposed protein, and that it is a useful immunogen. 



Example 92 

The following partial DNA sequence was identified in JV. meningitidis <SEQ ID 773>: 

1 . .ATTCCCGCCA CGATGACATT TGAACGCAGC GGCAATGCTT ACAAAAT CGT 

51 TTCGACGATT AAAGTGCCGC TATACAATAT CCGTTTCGAG TCCGGCGGTA 

101 CGGTTGTCGG CAATACCCTG CACCCTACCT ACTATAGAGA CATACGCAGG 

151 GGCAAACTGT ATGCGGAAgc CAAATTCGCC GACgGcAGCG TAACTTACGG 

201 CAAAGCGGGC GAGAGCAAAA CCGAGCAAAG CCCCAAGGCT ATGGATTTGT 
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251 TCACGCTTGC CTGGCAGTTG GCGGCAAATG ACGCGAAACT CCCCCCGGGG 

301 CTGAAAATCA CCAACGGCAA AAAACTTTAT TCCGTCGGCG GTTTGAATAA 

351 GGCGGGTACA GGAAAATACA GCATAGGCGG CGTGGAAACC GAAGTCGTCA 

401 AATATCGGGT GCGGCGCGGC GACGATGCGG TAATGTATTT cTTCGCACCG 

451 TCCCTGAACA ATATTCCGGC ACAAATCGGC TATACCGACG ACGGCAAAAC 

501 CTATACGCTG AAACTCAAAT CGGTGCAGAT CAACGGCCAG GCAGCCAAAC 

551 CGTAA 

This corresponds to the amino acid sequence <SEQ ID 774; ORF120>: 



1 . . IPA2MTFERS GNAYKIVSTI KVPLYNIRFE SGGTVVGNTL HPTYYRDIRR 

51 GKLYAEAKFA DGSVTYGKAG ESKTEQSPKA MDLFTLAWQL AANDAKLPPG 

101 LKITNGKKLY SVGGLNKAGT GKYSIGGVET EWKYRVRRG DDAVMYFFAP 

151 SLNNIPAQIG YTDDGKTYTL KLKSVQINGQ AAKP* 

Further work revealed the complete nucleotide sequence <SEQ ID 775>: 



1 ATGATGAAGA CTTTTAAAAA TATATTTTCC GCCGCCATTT TGTCCGCCGC 

51 CCTGCCGTGC GCGTATGCGG CAGGGCTGCC CCAATCCGCC GTGCTGCACT 

101 ATTCCGGCAG CTACGGCATT CCCGCCACGA TGACATTTGA ACGCAGCGGC 

151 AATGCTTACA AAATCGTTTC GACGATTAAA GTGCCGCTAT ACAATATCCG 

201 TTTCGAGTCC GGCGGTACGG TTGTCGGCAA TACCCTGCAC CCTACCTACT 

251 ATAGAGACAT ACGCAGGGGC AAACTGTATG CGGAAGCCAA ATTCGCCGAC 

301 GGCAGCGTAA CTTACGGCAA AGCGGGCGAG AGCAAAAC CG AGCAAAGCCC 

351 CAAGGCTATG GATTTGTTCA CGCTTGCCTG GCAGTTGGCG GCAAATGACG 

401 CGAAACTCCC CCCGGGGCTG AAAATCACCA ACGGCAAAAA ACTTTATTCC 

451 GTCGGCGGTT TGAATAAGGC GGGTACAGGA AAATACAGCA TAGGCGGCGT 

501 GGAAACCGAA GTCGTCAAAT ATCGGGTGCG GCGCGGCGAC GATGCGGTAA 

551 TGTATTTCTT CGCACCGTCC CTGAACAATA TTCCGGCACA AATCGGCTAT 

601 ACCGACGACG GCAAAACCTA TACGCTGAAA CTCAAATCGG TGCAGATCAA 

651 CGGCCAGGCA GCCAAACCGT AA 

This corresponds to the amino acid sequence <SEQ ID 776; ORF120-1>: 



1 MMKTFKNIFS AAILSAALPC AYA AGLPQSA VLHYSGSYGI FAIMTFERSG 

51 NAYKIVSTIK VPLYNIRFES GGTWGNTLH PTYYRDIRRG KLYAEAKFAD 

101 GSVTYGKAGE SKTEQSPKAM DLFTLAWQLA ANDAKLPPGL KITNGKKLYS 

151 VGGLNKAGTG KYSIGGVETE VVKYRVRRGD DAVMYFFAPS LNNIPAQIGY 

201 TDDGKTYTLK LKSVQINGQA AKP* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF120 shows 92.4% identity over a 184aa overlap with an ORF (ORF120a) from strain A of AA 
meningitidis: 



orf 120 .pep 



IPATMTFERSGNAYKIVSTI KV PLYNIRFE 
I I I I : II I I I I I I I I I I I I I I I I 

SAAILSAALPCAYAAGLPXSAVLHYSGSYGIPATXXXXXXXNAXKIVSTIKVPLYNIRFE 



orf 120 . pep SGGTWGNTLHPTYYRDIRRC-KLYAEAKFADGSVTYGKAGESKTEQSPKAMDLFTLAWQL 



100 110 120 130 140 150 

orf 120 . pep AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGDDAVMYFFAP 

orf 120a AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGDDAVMYFFAP 
130 140 150 160 170 180 
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160 170 180 

orf 120 .pep SLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 

or f 1 2 0 a S LNN I PAQI G YT DDGKT YT LKLKSVQ IN GQAAKPX 

5 190 200 210 220 

The complete length ORF120a nucleotide sequence <SEQ ID 777> is: 



1 ATGATGAAGA CTTTTAAAAA TATATTTTCC GCCGCCATTT TGTCCGCCGC 

51 CCTGCCGTGC GCGTATGCGG CAGGGCTGCC CNAATCCGCC GTGCTGCACT 

101 ATTCCGGCAG CTACGGCATT CCCGCCACNA NNANNTNNGN ACNNNGNGNC 

10 151 AATGCTTNCA AAATCGTTTC GACGATTAAA GTGCCGCTAT ACAATATCCG 

201 TTTCGAGTCC GGCGGTACGG TTGTCGGCAA TACCCTGCAC CCTACCTACT 

251 ATAGAGACAT ACGCAGGGGC AAACTGTATG CGGAAGCCAA ATTCGCCGAC 

301 GGCAGCGTAA CCTACGGCAA AGCGGNNNNN ANCNNNNNNG NGCAAAGCCC 

351 CAAGGCTATG GATTTGTTCA CGCTTGCNTG GCAGTTGGCG GCAAATGACG 

15 4 01 CGAAACTCCC CCCGGGGCTG AAAATCACCA ACGGCAAAAA ACTTTATTCC 

4 51 GTCGGCGGTT TGAATAAGGC GGGTACAGGA AAATACAGCA TAGGCGGCGT 

501 GGAAACCGAA GTCGTCAAAT ATCGGGTGCG GCGCGGCGAC GATGCGGTAA 

551 TGTATTTCTT CGCACCGTCC CTGAACAATA TTCCGGCACA AATCGGCTAT 

601 ACCGACGACG GCAAAACCTA TACGCTGAAA CTCAAATCGG TGCAGAT CAA 

20 651 CGGCCAGGCA GCCAAACCGT AA 

This encodes a protein having amino acid sequence <SEQ ID 778>: 



1 MMKTFKNIFS AAILSAALPC AYA AGLPXSA VLHYSGSYGI PATXXXXXXX 

51 NAXKIVSTIK VPLYNIRFES GGTWGNTLH PTYYRDIRRG KLYAEAKFAD 

101 GSVTYGKAXX XXXXQSPKAM DLFTLAWQLA ANDAKLPPGL KITNGKKLYS 

151 VGGLNKAGTG KYSIGGVETE WKYRVRRGD DAVMYFFAPS LNNIPAQIGY 

201 TDDGKTYTLK LKSVQINGQA AKP* 

ORF120a and ORF120-1 show 93.3% identity in 223 aa overlap: 

10 20 30 40 50 60 

orf 120a . pep MMKTFKNIFSAAILSAALPCAYAAGLPX3AVLHYSGSYGIPATXXXXXXXNAXKIVSTIK 

orf 120-1 MMKTFKNIFSAAILSAALPCAYAAGLPQSAVLHYSGSYGIPATMTFERSGNAYKIVSTIK 



orf 120a. pep 



10 



20 



30 



40 



50 



60 



70 80 90 100 110 120 

VPLYNIRFESGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAXXXXXXQSPKAM 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I 

VPLYNIRFESGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAM 

70 80 90 100 110 120 



190 200 210 220 

DAVMYFFAP S LNN I PAQ I GYT DDGKT YT LKLKSVQ INGQAAKPX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I 
DAVMYFFAPS LNNI PAQ I GYT DDGKTYTLKLKSVQINGQAAKPX 

190 200 210 220 



Homology with a predicted ORF from N.sonorrhoeae 

ORF120 shows 97.8% identity over 184 aa overlap with a predicted ORF (ORF120ng) from 
N. gonorrhoeae: 

55 orf 120. pep IPATMTFERSGNAYKIVSTIKVPLYNIRFE 30 

orfl20ng SAAILSAALPCAYAARLPQSAVLHYSGSYGIPATMTFERSGNAYKIVSTIKVPLYNIRFE 69 



orf 120 . pep SGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAMDLFTLAWQL 90 

60 1 111:11:1111 I I I I I I I I I I I I I I I I I I I I I Ml 

orfl20ng SGGTWGNTLHPAYYKDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAMDLFTLAWQL 129 
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orf 120 .pep AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGDDAVMYFFAP 150 

orfl20ng AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGDDTVTYFFAP 189 

orf 120. pep SLNN I PAQIGYTDDGKTYTLKLKSVQINGQAAKP 184 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl20ng SLNNI PAQIGYTDDGKTYTLKLKSVQINGQAAKP 223 

The complete length ORF120ng nucleotide sequence <SEQ ID 779> is: 

1 ATGATGAAGA CTTTTAAAAA TATATTTTCC GCCGCCATTT TGTCCGCCGC 

51 CCTGCCGTGC GCGTATGCGG CAAGGCTACC CCAATCCGCC GTGCTGCACT 

101 ATTCCGGCAG CTACGGCATT CCCGCCACGA TGACATTTGA ACGCAGCGGC 

151 AATGCTTACA AAATCGTTTC GACGATTAAA GTGCCGCTAT ACAATATCCG 

201 TTTCGAATCC GGCGGTACGG TTGTCGGCAA TACCCTGCAC CCTGCCTACT 

251 ATAAAGACAT ACGCAGGGGC AAAGTGTATG CGGAAGCCAA ATTCGCCGAC 

301 GGCAGCGTAA CCTACGGCAA AGCGGGCGAG AGCAAAACCG AGCAAAGCCC 

351 CAAGGCTATG GATTTGTTCA CGCTTGCCTG GCAGTTGGCG GCAAATGACG 

4 01 CGAAACTCCC CCCGGGTCTG AAAATCACCA ACGGCAAAAA ACTTTATTCC 

4 51 GTCGGCGGCC TGAATAAGGC GGGTACGGGA AAAT AC AG C A TaggCGGCGT 

501 GGAAACCGAA GTCGTCAAAT ATCGGGTGCG GCGCGGCGAC GATACGGTAA 

551 CGTATTTCTT CGCACCGTCC CTGAACAATA TTCCGGCACA AAT CGGCTAT 

601 ACCGACGACG GCAAAACCTA TACGCTGAAG CTCAAATCGG TGCAGATCAA 

651 CGGACAGGCC GCCAAACCGT AA 

This encodes a protein having amino acid sequence <SEQ ID 780>: 

1 MMKTFKNIFS AAILSAALPC AYA ARLPQSA VLHYSGSYGI PATMTFERSG 

51 NAYKIVSTIK VPLYNIRFES GGTWGNTLH PAYYKDIRRG KLYAEAKFAD 

101 GSVTYGKAGE SKTEQSPKAM DLFTLAWQLA ANDAKLPPGL KITNGKKLYS 

151 VGGLNKAGTG KYSIGGVETE WKYRVRRGD DTVTYFFAPS LNNIPAQIGY 

201 TDDGKTYTLK LKSVQINGQA AKP* 

In comparison with ORF 120-1, ORF120ng shows 97.8% identity in 223 aa overlap: 

10 20 30 40 50 60 

orf 120-1. pep MMKTFKNIFSAAILSAALPCAYAAGLPQSAVLHYSGSYGIPATMTFERSGNAYKIVSTIK 

I I I I I I I I I I I I I I Ill I I I I II I I I I I I I I I I I I I ! I II I I I I I I I 

orfl20ng MMKTFKNIFSAAILSAALPCAYAARLPQSAVLHYSGSYGIPATMTFERSGNAYKIVSTIK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 12 0-1. pep VPLYNIRFESGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQS PKAM 

I I I I I I I I I : I I : I I I I I I 

orfl20ng VPLYNIRFESGGTWGNTLHPAYYKDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAM 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 120-1 . pep DLFTLAWQLAANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETE WKYRVRRGD 

I I I I I I I I I I I I I I I I I I I I I I I I I I II Ill 

orfl20ng DLFTLAWQLAANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGD 

130 140 150 160 170 180 

190 200 210 220 

orf 120-1 .pep DAVMYFFAPSLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 
I : I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 
orfl20ng DTVT YFFAPSLNNI PAQ I GYT D DGKT YTLKLKSVQINGQAAKPX 

190 200 210 220 

This analysis, including the presence of a putative leader sequence in the gonococcal protein 
suggests that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 93 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 781>: 
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1 ATGTATCGGA GGAAAGGGCG GGGCATCAAG CCGTGGATGG GTGCCGGTGC 

51 . GCGTTTGCC GCCTTGGTCT GGCTGGTTTT CGCGCTCGGC GATACTTTGA 

101 CTCCGTTTGC GGTTGCGGCG GTGCTGGCGT ATGTATTGGA CCCTTTGGTC 

151 GAATGGTTGC AGAAAAAGGG TTTGAACCGT GCATCCGCTT CGATGTCTGT 

201 GATGGTGTTT TCCTTGATTT TGTTGTTGGC ATTATTGTTG ATTATCGTCC 

251 CTATGCTGGT CGGGCAGTTC AACAATTTGG CATCGCGCCT GCCCCAATTA 

301 ATCGGTTTTA TGCAGAACAC GCTGCTGCCG TGGTTGAAAA ATACAATCGG 

351 CGGATATGTG GAAATCGATC AG G CATC TAT TATTGCGTGG CTTCAGGCGC 

4 01 ATACGGGAGA GTTGAGCAAC GCGCTTAAGG CGTGGTTTCC CGTTTTGATG 

4 51 AGGCAGGGCG GCAATATT . . 

This corresponds to the amino acid sequence <SEQ ID 782; ORF121>: 

1 MYRRKGRGIK PWMGAGXAFA ALVWLVFALG DTLTPFAVAA VLAYVLDPLV 

51 EWLQKKGLNR ASASMSVMVF SLILLLALLL IIVPMLVGQF NNLASRLPQL 

101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW LQAHTGELSN ALKAWFPVLM 

151 RQGGNI . . 

Further work revealed the complete nucleotide sequence <SEQ ID 783>: 



1 ATGTATCGGA GGAAAGGGCG GGGCATCAAG CCGTGGATGG GTGCCGGTGC 

51 GGCGTTTGCC GCCTTGGTCT GGCTGGTTTT CGCGCTCGGC GATACTTTGA 

101 CTCCGTTTGC GGTTGCGGCG GTGCTGGCGT ATGTATTGGA CCCTTTGGTC 

151 GAATGGTTGC AGAAAAAGGG TTTGAACCGT GCATCCGCTT CGATGTCTGT 

201 GATGGTGTTT TCCTTGATTT TGTTGTTGGC ATTATTGTTG ATTATCGTCC 

251 CTATGCTGGT CGGGCAGTTC AACAATTTGG CATCGCGCCT GCCCCAATTA 

301 ATCGGTTTTA TGCAGAACAC GCTGCTGCCG TGGTTGAAAA ATACAATCGG 

351 CGGATATGTG GAAATCGATC AGGCATCTAT TATTGCGTGG CTTCAGGCGC 

401 ATACGGGAGA GTTGAGCAAC GCGCTTAAGG CGTGGTTTCC CGTTTTGATG 

451 AGGCAGGGCG GCAATATTGT CAGCAGTATC GGCAACCTGC TGCTGCTTCC 

501 CTTGCTGCTT TACTATTTCC TGCTGGATTG GCAGCGGTGG TCGTGCGGCA 

551 TTGCCAAACT GGTTCCGAgG CGTTTTGCCG GTGCTTATAC GCGCATTACA 

601 GGCAATTTGA ACGAGGTATT GGGCGAATTT TTGCGCGGGC AGCTTCTGGT 

651 AATGCTGATT ATGGGCTTGG TTTACGGTTT GGGATTGGTG CTGGTCGGGC 

7 01 TGGATTCGGG GTTTGCCATC GGTATGCTTG CCGGTATTTT GGTGTTTGTC 

751 CCTTATCTCG GGGCGTTTAC GGGATTGCTG CTTGCCACCG TCGCCGCCTT 

801 GCTCCAGTTC GGTTCGTGGA ACGGCATCCT ATCGGTTTGG GCGGTTTTTG 

851 CCGTAGGACA GTTTCTCGAA AGTTTTTTCA TTACGCCGAA AATCGTGGGA 

901 GACCGTATCG GGCTGTCGCC GTTTTGGGTT ATCTTTTCGC TGATGGCGTT 

951 CGGGCAGCTG ATGGGCTTTG TCGGAATGTT GGCGGGATTG CCTTTGGCCG 

1001 CCGTAACCTT GGTCTTGCTT CGCGAGGGCG TGCAGAAATA TTTTGCCGGC 

1051 AGTTTTTACC GGGGCAGGTA G 

This corresponds to the amino acid sequence <SEQ ID 784; ORF121-l>: 

1 MYRRKGRGIK PWMGAGAAFA ALVWLVFALG DTL TPFAVAA VLAYVLDPLV 
51 EWLQKKGLNR ASASMS VMVF SLILLLALLL IIV PMLVGQF NNLASRLPQL 
101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW LQAHTGELSN ALKAWFPVLM 
151 RQGGNIVS SI GNLLLLPLLL YYFLL DWQRW SCGIAKLVPR RFAGAYTRIT 
201 GNLNEVLGEF LRGQL LVMLI MGLVYGLGLV LV GLDSGFAI GMLAG ILVFV 
251 PYLGAFTGLL LA TVAALLQF GSWNG ILSVW AVFAVGQFLE SF FITPKIVG 
301 DRIGLSPFWV IFSLMAFGQL MG FVGMLAGL PLAAVTLVLL REGVQKYFAG 
351 SFYRGR* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF121 shows 98.7% identity over a 1 56aa overlap with an ORF (ORF121a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 121 . pep MYRRKGRGIKPWMGAGXAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 

11 I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 121a MYRRKGRGIKPWMDAGAAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 

10 20 30 40 50 60 



orf 121. pep 



70 80 90 100 110 120 

ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 
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130 140 150 
E I DQAS I IAWLQAHTGELSNALKAWFPVLMRQGGNI 
I I I I I I 1 

EIDQASIIAWLQAHTG3LSNALKAWFPVLMRQGGNIVSSIGNLLLLPLLLYYFLLDWQRW 
130 140 150 160 170 180 



The complete length ORF121 a nucleotide sequence <SEQ ID 785> is: 



ATGTATCGGA 
GGCGTTTGCC 
CTCCGTTTGC 
GAATGGTTGC 
GATGGTGTTT 
CTATGCTGGT 
ATCGGTTTTA 
CGGATATGTG 
ATACGGGCGA 
AGGCAGGGCG 
CTTGCTGCTT 
TTGCCAAACT 
GGCAATTTGA 
GATGCTGATT 
TGGATTCGGG 
CCCTATTTGG 
GCTCCAGTTC 
CCGTAGGACA 
GACCGTATCG 
CGGGCAGCTG 
CCGTAACCTT 
AGTTTTTACC 



GGAAAGGGCG 
GCCTTGGTCT 
GGTTGCGGCG 
AGAAAAAGGG 
TCCTTGATTT 
CGGGCAGTTC 
TGCAGAACAC 
GAAATCGATC 
GTTGAGCAAC 
GCAATATTGT 
TACTATTTCC 
GGTTCCGAGG 
ACGAGGTATT 
ATGGGTTTGG 
GTTTGCAATC 
GCGCGTTTAC 
GGTTCGTGGA 
GTTTCTCGAA 
GCCTGTCGCC 
ATGGGCTTTG 
GGTCTTGCTT 
GGGGCAGGTA 



GGGCATCAAG 
GGCTGGTTTT 
GTGCTGGCGT 
TTTGAACCGT 
TGTTGTTGGC 
AACAATTTGG 
GCTGCTGCCG 
AGGCATCTAT 
GCGCTTAAGG 
CAGCAGTATC 
TGCTGGATTG 
CGTTTTGCCG 
GGGCGAATTT 
TTTACGGCTT 
GGTATGGTTG 
AGGACTGCTG 
ACGGCATCTT 
AGTTTTTTCA 
GTTTTGGGTT 
TCGGAATGTT 
CGCGAGGGCG 



CCGTGGATGG 
CGCGCTCGGC 
ATGTATTGGA 
GCATCCGCTT 
ATTATTGTTG 
CATCGCGCCT 
TGGTTGAAAA 
TATTGCGTGG 
CGTGGTTTCC 
GGCAACCTGC 
GCAGCGGTGG 
GTGCTTATAC 
TTGCGCGGGC 
GGGGTTGGTG 
CCGGTATTTT 
CTGGCAACCG 
GGCTGTTTGG 
TTACGCCGAA 
ATCTTTTCGC 
GGCCGGATTG 
TGCAGAAATA 



ATGCCGGTGC 
GATACTTTGA 
CCCTTTGGTC 
CGATGTCTGT 
ATTATTGTCC 
GCCCCAATTA 
ATACAATCGG 
CTTCAGGCGC 
CGTTTTGATG 
TGCTGCTTCC 
TCGTGCGGCA 
GCGCATTACA 
AGCTTCTGGT 
CTGGTCGGGC 
GGTTTTTGTT 
TCGCCGCCTT 
GCGGTTTTTG 
AATCGTGGGA 
TGATGGCGTT 
CCTTTGGCCG 
TTTTGCCGGC 



This encodes a protein having amino acid sequence <SEQ ID 786>: 



1 MYRRKGRGIK PWMDAGAAFA ALVWLVFALG DTL TPFAVAA VLAYVLDPLV 

51 EWLQKKGLNR ASASMS VMVF SLILLLALLL IIV PMLVGQF NNLASRLPQL 

101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW LQAHTGELSN ALKAWFPVLM 

151 RQGGNIVS SI GNLLLLPLLL YYFLL DWQRW SCGIAKLVPR RFAGAYTRIT 

201 GNLNEVLGEF LRGQL LVMLI MGLVYGLGLV LV GLDSGFAI GMVAG ILVFV 

251 PYLGAFTGLL LA TVAALLQF GSWNG ILAVW AVFAVGQFLE SF FITPKIVG 

301 DRIGLSPFWV IFSLMAFGQL MG FVGMLAGL PLAAVTLVLL REGVQKYFAG 

351 SFYRGR* 

ORF121a and ORF121-1 show99.2% identity in 356 aa overlap: 

10 20 30 40 50 60 

MYRRKGRG I K PWMDAGAAFAALVWLV FALG DT LT P FAVAAVLAYVLD PLVEWLQKKGLNR 
I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I 1 I I I I I I I I I I I I I I II I I I I 1 i I I I I I I I 
MYRRKGRGIKPWMGAGAAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 
10 20 30 40 50 60 



orf 121a. pep 
orfl21-l 



70 80 90 100 110 120 

orf 121a . pep ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 

I II I I I I Nil I I I I Ill 

orf 121-1 ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 
70 80 90 100 110 120 



130 140 150 160 170 180 

or f 12 la . pep E I DQAS IIAWLQAHTGELSNALKAWFPVLMRQGGNIVS S IGNLLLLPLLLYYFLLDWQRW 
I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I 
orfl21-l EI DQAS 1 1 AWLQAHT GELS N ALKAWF PVLMRQGGNI VS S I GNLLLLPLLL YYFLLDWQRW 

130 140 150 160 170 180 



orf 121a. pep 



190 200 210 220 230 240 

SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 
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orf 121-1 SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 
190 200 210 220 230 240 

250 260 270 280 290 300 

orf 121a . pep GMVAGILVFVPYLGAFTGLLLATVAALLQFGSWNGILAVWAVFAVGQFLESFFITPKIVG 
I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I : I I I I I I I I I I I I I I I I I I I I I I 
orf 121-1 GMLAGILVFVPYLGAFTGLLLATVAALLQFGSWNGILSVWAVFAVGQFLESFFITPKIVG 

250 260 270 280 290 300 

310 320 330 340 350 

orf 12 la. pep DRIGLSPFWVIFSLMAFGQLMGFVGMLAGLPLAAVTLVLLREGVQKYFAGSFYRGRX 
I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
orf 12 1-1 DRIGLSPFWVIFSLMAFGQLMGFVGMLAGLPLAAVTLVLLREGVQKYFAGSFYRGRX 

310 320 330 340 350 



Homology with a predicted ORF from N. gonorrhoeae 

ORF121 shows 97.4% identity over a 156 aa overlap with a predicted ORF (ORF121ng) from 
N. gonorrhoeae: 

orf 121 . pep MYRRKGRGIKPWMGAGXAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 60 

II I I I : I I I I I I I I I I I I I I I I I 

Orfl21ng MYRRKGRGIKPWMGAGAAFAALVWLVYALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 60 

orf 121 . pep ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 120 

II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orfl21ng ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 120 

orf 121. pep EIDQAS I IAWLQAHTGELSNALKAWFPVLMRQGGNI 156 

orfl21ng EIDQAS I IAWFQAHTGELSNALKAWFPVLMKQGGNIVSTIGNLLLPPLLLYYFLLDWHRW 180 

An ORF121ng nucleotide sequence <SEQ ID 787> was predicted to encode a protein having amino 
acid sequence <SEQ ID 788>: 



1 MYRRKGRGIK PWMGAGAAFA ALVWLVYALG DTL TPFAVAA VLAYVLDPLV 

51 EWLQKKGLNR ASASMS VMVF SLILLLALLL IIV PMLVGQF NNLASRLPQL 

101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW FQAHTGELSN ALKAWFPVLM 

151 KQGGNIVS TI GNLLLPPLLL YYFLL DWHRW SCGIPKLVPR RFAGAYTRIT 

201 GNLNKVWGKF LRGQLLGETE RGAWCRVGR ECWEGGGARS RPSDDGWPRW 

251 GGG* 

Further work revealed the following gonoccocal DNA sequence <SEQ ID 789>: 



1 ATGTATCGGA GAAAAGGACG 

51 GGCGTTTGCC GCCTTGGTCT 

101 CTCCGTTTGC GGTTGCGGCG 

151 GAATGGTTGC AGAAAAAGGG 

201 GATGGTGTTT TCCTTGATTT 

251 CTATGCTGGT CGGGCAGTTC 

301 ATCGGTTTTA TGCAGAACAC 

351 CGGATATGTG GAAATCGATC 

4 01 ATACGGGCGA GTTGAGCAAC 

451 AAACAGGGCG GCAATATTGT 

501 CTTGCTGCTT TACTATTTCC 

551 TCGCCAAACT GGTTCCGAGG 

601 GGTAATTTGA ACGAGGTATT 

651 GATGCTGATT ATGGGCTTGG 

701 TGGATTCGGG ATTTGCCATC 

751 CCCTATTTGG GTGCGTTTAC 

801 GCTCCAGTTC GGTTCGTGGA 

851 CCGTCGGTCA GTTTCTCGAA 

901 GACCGTATCG GCCTGTCGCC 

951 CGGAGAGCTG ATGGGCTTTG 

1001 CCGTAACCTT GGTCTTGCTT 

1051 AGTTTTTACC GGGGCAGGTA 



GGGCATCAAG CCGTGGATGG GTGCCGGCGC 
GGCTGGTTTA CGCGCTCGGC GATACTTTGA 
GTGCTGGCGT ATGTGTTGGA CCCTTTGGTC 
TTTGAACCGT GCATCCGCTT CGATGTCTGT 
TGTTGTTGGC ATTATTGTTG ATTATTGTCC 
AATAATTTGG CATCTCGCCT GCCCCAATTA 
GCTGCTGCCG TGGTTGAAAA ATACAATCGG 
AGGCATCTAT TATTGCGTGG TTTCAGGCGC 
GCGCTTAAGG CGTGGTTTCC CGTTTTGATG 
CAGCAGTATC GGCAACCTGC TGCTGCCGCC 
TGCTGGATTG GCAGCGGTGG TCGTGCGGCA 
CGTTTTGCCG GTGCTTATAC GCGCATTACG 
GGGCGAATTT TTGCGCGGTC AGCTTCTGGT 
TTTACGGTTT GGGATTGATG CTAGTCGGAC 
GGTATGGTTG CCGGTATTTT GGTGTTTGTC 
GGGATTGCTG CTTGCCACTG TTGCAGCCTT 
ACGGAATCTT GGCTGTTTGG GCGGTTTTTG 
AGTTTTTTCA TTACGCCGAA AATTGTAGGA 
GTTTTGGGTT ATCTTTTCGC TGATGGCGTT 
TCGGAATGTT GGCCGGATTG CCTTTGGCCG 
CGCGAGGGCG CGCAGAAATA TTTTGCCGGC 
G 
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This corresponds to the amino acid sequence <SEQ ID 790; ORF121ng-l>: 

1 MYRRKGRGIK PWMGAGAAFA ALVWLVYALG DTLTPFAVAA VLAYVLDPLV 



EWLQKKGLNR ASASMS VMVF SLILLLA1LL IIV PMLVGQF NNLASRLPQL 

IGFMQNTLLP WLKNTIGGYV EIDQASIIAW FQAHTGELSN ALKAWFPVLM 

KQGGNIVS SI GNLLLPPLLL YYFLL DWQRW SCGIAKLVPR RFAGAYTRIT 

GNLNEVLGEF LRGQL LVMLI MGLVYGLGLM LV GLDSGFAI GMVAG ILVFV 

PYLGAFTGLL LA TVAALLQF GSWNG ILAVW AVFAVGQFLE SF FITPKIVG 

DRIGLSPFWV IFS LMAFGEL MG FVGMLAGL PLAAVTLVLL REGAQKYFAG 

SFYRGR* ~ " 



10 ORF121ng-l and ORF121-1 show 97.5% identity in 356 aa overlap: 

10 20 30 40 50 60 

orf 121-1 . pep MYRRKGRGIKPWMGAGAAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 
I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl21ng-l MYRRKGRGIKPWMGAGAAFAALVWLVYALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 



10 



20 



30 



40 



50 



60 



orf 121-1. pep 



70 80 90 100 110 120 

ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 

I I I I I I I I I I I I I I I I 

ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 

70 80 90 100 110 120 



130 



14C 



150 



160 



170 



180 



orfl21-l.pep 



orf 121-1 .pep 
orfl21ng-l 



EIDQASIIAWLQAHTGELSNALKAWFPVLMRQGGNIVSSIGNLLLLPLLLYYFLLDWQRW 

I I I I : I II : I I I I I I I I I I I I I I 

EIDQASIIAWFQAHTGELSNALKAWFPVLMKQGGNIVSSIGNLLLPPLLLYYFLLDWQRW 
130 140 150 160 170 180 

190 200 210 220 230 240 

SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 

I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I 

SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLMLVGLDSGFAI 

190 200 210 220 230 240 

250 260 270 280 290 300 

GMLAGILVFVPYLGAFTGLLLATVAALLQFGSWNGILSVWAVFAVGQFLESFFITPKIVG 

M: I I I I I I I I II I I : I I I I I I I I I I I I 

GMVAGILVFVPYLGAFTGLLLATVAALLQFGSWNGILAVWAVFAVGQFLESFFITPKIVG 

250 260 270 280 290 300 

310 320 330 340 350 

DRIGLSPFWVIFSLMAFGQLMGFVGMLAGLPLAAVTLVLLREGVQKYFAGSFYRGRX 
I I I I I I I M I I I I I I I I I : I I II I II I ! I I I I I I I I I I I I I I I : I I I I I I I I I I I I I 
DRIGLSPFWVIFSLMAFGELMGFVGMLAGLPLAAVTLVLLREGAQKYFAGSFYRGRX 

310 320 330 340 350 



In addition, ORF121ng-l shows homology to a permease from H. influenzae: 

sp|P43969|PERM_HAEIN PUTATIVE PERMEASE PERM HOMOLOG Length =34 9 
Score =69.9 bits (168), Expect = 2e-ll 

Identities = 67/317 (21%), Positives = 120/317 (37%), Gaps = 7/317 (2%) 

Query: 26 VYALGDTLTPFAVAAVLAYVLDPLVEWL-QKKGLNRASASMSVMVFSXXXXXXXXXXXVP 84 

+Y GD + P +A VL+Y+L+ + +L Q R A++ + VP 

Sbjct: 32 IYFFGDLIAPLLIALVLSYLLEIPINFLNQYLKCPRMLATILIFGSFIGLAAVFFLVLVP 91 

Query: 85 MLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYVE-IDQASIIAWFQAHTGELSNALK 143 
ML Q +L S LP + N WL N Y E ID + + + F + ++ + 

Sbjct: 92 MLWNQTISLLSDLPAMF NKSNEWLLNLPKNYPELI DYSMVDS I FNSVREKILGFGE 147 

Query: 144 AWFPVLMKQGGNIVSSIGNXXXXXXXXXXXXXDWQRWSCGIAKLVPRRFAGAYTRITGNL 203 



Sbjct: 



204 NEVLGEFLRGQXXXXXXXXXXXXXXXXXXXXDSGFAIGMVAGILVFVPYXXXXXXXXXXX 263 

+ + ++ G+ + + G+ V VPY 

207 QQQISNYIHGKLLEILIVTLITYIIFLIFGLNYPLLLAFAVGLSVLVPYIGAVIVTIPVA 266 
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Query: 264 XXXXXQFGSWNGILAVWAVFAVGQFLESFFITPKIVGDRIGLSPFWVIFSLMAFGELMGF 323 

QFG + FAV QL+ +P+ ++LP +1 S++ FG L GF 

Sbjct: 267 LVALFQFGISPTFWYIIIAFAVSQLLDGNLLVPYLFSEAVNLHPLIIIISVLIFGGLWGF 326 

Query: 324 VGMLAGLPLAAVTLVLL 340 

G+ +PLA + ++ 
Sbjct: 327 WGVFFAI PLATLVKAVI 343 

Based on this analysis, including the presence of a putative leader sequence and transmembrane 
domains in the two proteins, it is predicted that the proteins from N. meningitidis and 
iV. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 94 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 791>: 

1 . . ACTGCTTTTT CGGCGGCGCT GCGCTTGAGT CCATCATGAC TCGTCATATT 

51 TTTGTCCTTT GGGAAACCGT ATCAACAAAC AGCCGCCATC TTAACATTTT 

101 TTTGCACGTC CTGCCCGCCG CGTTCAAATG CGTACCAGCA ATACCGCCGC 

151 CTGCGCCTCT ATGCCTTCCA TCCGCCCGAG ATAGCCGAGT TTTTCGTTGG 

201 TTTTGCCTTT GATGTTGACG CACGAAATGT CTATGCCCAA ATCGGCGGCG 

251 ATGTTGGCAC GCATTTGCGG AATGTGCGGC GCGAGTGTGG GTTTCTGTGC 

301 AATCACGGTC GTATCGACAT TGACCGCCTG CCAACCCTGC GCCTGAACGC 

351 TTTGATACGC CGCACGCAAA AGGACGCGGC TGTCCGCATC TTTGAACTCT 

401 GCGGCGGTGT CGGGGAAATG GCTGCCGATA TCGCCCAAAC CTGCCGCACC 

451 GAGCAGCGCG TCGGTAACGG CGTGCAGCAG CGCATCGGCA TCGGAGTGTC 

501 CGAGCAGCCC TTTTTCAAAT GGGATTTCAA CTCCGCCAAG TAT C AG . . 

This corresponds to the amino acid sequence <SEQ ID 792; ORF122>: 



1 . . TAFSAALRLS PSXLVIFLSF GKPYQQTAAI LTFFCTSCPP RSNAYQQYRR 

51 LRLYAFHPPE IAEFFVGFAF DVDARNVYAQ IGGDVGTHLR NVRRECGFLC 

101 NHGRIDIDRL PTLRLNALIR RTQKDAAVRI FELCGGVGEM AADIAQTCRT 

151 EQRVGNGVQQ RIGIGVSEQP FFKWDFNSAK YQ. . 

Further work revealed the complete nucleotide sequence <SEQ ID 793>: 



1 ATATCGTACT GGGCAAGCAG TTCGCCGGAT TTTTTGGAAG TAGATACCGC 

51 GCCTTTGATT TTTTTGCCGC TCTTACCCAA GGCTTCGATG AAAAAGTTGA 

101 TGGTCGAGCC GGTACCGATG CCGATATATT CATTTTCGGG TACGAATTCG 

151 ACTGCTTTTT CGGCGGCGAT GCGCTTGAGT TCGTCTTGTG TCGTCATATT 

201 TTTGTCCTTT GGGAAACCGT ATCAACAAAC AGCCGCCATC TTAACATTTT 

251 TTTGCACGTC CTGCCCGCCG CGTTCAAATG CGTACCAGCA ATACCGCCGC 

301 CTGCGCCTCT ATGCCTTCCA TCCGCCCGAG ATAGCCGAGT TTTTCGTTGG 

351 TTTTGCCTTT GATGTTGACG CACGAAATGT CTATGCCCAA ATCGGCGGCG 

401 ATGTTGGCAC GCATTTGCGG AATGTGCGGC GCGAGTTTGG GTTTCTGTGC 

451 AATCACGGTC GTATCGACAT TGACCGCCTG CCAACCCTGC GCCTGAACGC 

501 TTTGATACGC CGCACGCAAA AGGACGCGGC TGTCCGCATC TTTGAACTCT 

551 GCGGCGGTGT CGGGGAAATG GCTGCCGATA TCGCCCAAAC CTGCCGCACC 

601 GAGCAGCGCG TCGGTAACGG CGTGCAGCAG CGCATCGGCA TCGGAGTGTC 

651 CGAGCAGCCC TTTTTCAAAT GGGATTTCAA CTCCGCCAAG TATCAGCTTT 

701 CTGCCTTCGG TCAGTTGGTG GACATCGTAG CCCTGTCCGA TACGGATGTT 

751 CGTCATCGTT TGTGTTCCTG A 

This corresponds to the amino acid sequence <SEQ ID 794; ORF122-l>: 

1 ISYWASSSPD FLEVDTAPLI FLPLLPKASM KKLMVEPVPM PIYSFSGTNS 

51 T AFSAAMRLS SSCVVIFL SF GKPYQQTAAI LTFFCTSCPP RSNAYQQYRR 

101 LRLYAFHPPE IAEFFVGFAF DVDARNVYAQ IGGDVGTHLR NVRREFGFLC 

151 NHGRIDIDRL PTLRLNALIR RTQKDAAVRI FELCGGVGEM AADIAQTCRT 

201 EQRVGNGVQQ RIGIGVSEQP FFKWDFNSAK YQLSAFGQLV DIVALSDTDV 

2 51 RHRLCS* 

Computer analysis of this amino acid sequence gave the following results: 
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Homology with a predicted ORF from N. meningitidis (strain A) 

ORF122 shows 94.0% identity over a 182aa overlap with an ORF (ORF122a) from strain A of N. 
meningitidis: 



orfl22.pep 
orfl22a 



TAFSAALRLSPSXLVIFLSFGKPYQQTAAI 
I I I I I I : I I I I : I I I I I I I I I I I I I I I I 
FLPLLPKASMKKLMVE PVPMPMYSFSGTNSTAFSAAMRLSSSCWIFLSFGKPYQQTAAI 



orfl22.pep LTFFCTSCPPRSNAYQQYRRLRLYAFHPPEIAEFFVGFAFDVDARNVYAQIGGDVGTHLR 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I ! I I I I I I I I 
orfl22a LTFFXTSCPPRSNPYQQYRRLRLYAFKAPEITEFF/GFAFXVDARNVYAQIGGDVGTHLR 



rf 122. pep NVRRECGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRIFELCGGVGEMAADIAQTCRT 



orfl22.pep EQRVGNGVQQRIGIGVSEQPFFKWDFKSAKYQ 

I I I I I I I I I I 1 I I I I 

orfl22a EQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLVDIVALSDTDVRHRLCSX 
210 220 230 240 250 

The complete length ORF122a nucleotide sequence <SEQ ID 795> is: 



ATATCATATT 
GCCTTTGATT 
TGGTCGAACC 
ACTGCNTTTT 
TTTGTCCTTT 
TTNNNACGTC 
CTGCGACTCT 
TTTTGCCTTT 
ATGTTGGCAC 
AATCACGGTC 
TTTGATACGC 
GCGGCGGTGT 
GAGCAGCGCG 
CGAGCAGCCC 
CTGCCTTCGG 
CGTCATCGTT 



GGGCAAGCAG 
TTTTTGCCGC 
GGTACCGATG 
CGGCGGCGAT 
GGGAAACCGT 
CTGCCCGCCG 
ATGCCTTCCA 
GANGTTGACG 
GCATTTGCGG 
GTATCGACAT 
CGCACGCAAA 
CGGGGAAATG 
TCGGTAACGG 
TTTTTCAAAT 
TCAGTTGGTG 
TGTGTTCCTG 



TTCACTGGAT 
TCTTACCCAA 
CCGATGTATT 
GCGCTTGAGT 
ATCAACAAAC 
CGTTCAAATC 
TGCGCCCGAG 
CACGAAATGT 
AATATGCGGC 
TGACCGCCTG 
AGGACGCGGC 
GCTGCCGATA 
CGTGCAGCAG 
GGGATTTCAA 
GACATCGTAG 



TTTTTGGAAG 
GGCTTCGATG 
CGTTTTCGGG 
TCGTCTTGTG 
AGCCGCCATC 
CTTACCAGCA 
ATAACCGAGT 
CTATGCCCAA 
GCGAGTTTGG 
CCAACCCTGC 
TGTCCGCATC 
TCGCCCAAAC 
CGCATCGGCA 
CTCCGCCAAG 
CCCTGTCCGA 



TAGATACCGC 
AAAAAGTTGA 
TACGAATTCG 
TCGTCATATT 
TTAACATTTT 
ATACCGCCGC 
TTTTCGTTGG 
ATCGGCGGCG 
GTTTCTGTGC 
GCCTGAACGC 
TTTGAACTCT 
CTGCCGCACC 
TCGGAGTGTC 
TATCAGCTTT 
TACGGATGTT 



This encodes a protein having amino acid sequence <SEQ ID 796>: 

1 ISYWASSSLD FLEVDTAPLI FLPLLPKASM KKLMVEPVPM PMYSFSGTNS 

51 T AFSAAMRLS SSCVVIFL SF GKPYQQTAAI LTFFXTSCPP RSNPYQQYRR 

101 LRLYAFHAPE ITEFFVGFAF XVDARNVYAQ IGGDVGTHLR NMRREFGFLC 

151 NHGRIDIDRL PTLRLNALIR RTQKDAAVRI FELCGGVGEM AADIAQTCRT 

201 EQRVGNGVQQ RIGIGVSEQP FFKWDFNSAK YQLSAFGQLV DIVALSDTDV 

251 RHRLCS* 

ORF122a and ORF122-1 show 96.9% identity in 256 aa overlap: 



ISYWASSSLDFLEVDTAPLIFLPLLPKASMKKLMVEPVPMPMYSFSGTNSTAFSAAMRLS 
ISYWASSSPDFLEVDTAPLIFLPLLPKASMKKLMVEPVPMPIYSFSGTNSTAFSAAMRLS 



orf 122a . pep SSCWIFLSFGKPYQQTAAILTFFXTSCPPRSNPYQQYRRLRLYAFHAPE ITEFFVGFAF 
orf 122-1 S SCWI FLSFGKPYQQTAAILT FFCT SCPPRSNAYQQYRRLRLYAFHPPE IAEFFVGFAF 
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XVDARNVYAQIGGDVGTHLRNMRREFGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRI 
DVDARNVYAQIGGDVGTHLRNVRREFGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRI 



FELCGGVGEMAADIAQTCRTEQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLV 



250 

DIVALSDTDVRHRLCSX 

I I I I I I I I 

DIVALSDTDVRHRLCSX 
250 



Homology with a predicted QRF from N. gonorrhoeae 

ORF122 shows 89.6% identity over a 182 aa overlap with a predicted ORF (ORF122ng) from 
N. gonorrhoeae: 

orfl22.pep TAFSAALRLS PSXLVI FLS FGKPYQQTAAI 30 

orfl22ng FLPLLPKASMKKLMVEPVPMPMYSFSGTNSTAFSAAMRLSSSCWIFLSFGKPYQQTAAI 80 

orf 122 .pep LTFFCTSCPPRSNAYQQYRRLRLYAFHPPEIAEFFVGFAFDVDARNVYAQIGGDVGTHLR 90 

I I I I I I I I I I I I I I I I I I I I I I I I I I I: I I I I : : I I I I I I 

orfl22ng LTFFCTSWPPRSNPYQQYRRLRLYAFHPPEIAEFFVGFAFDIDARNIDTQIGGDVGTHLR 140 

orf 122 .pep NVRRECGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRIFELCGGVGEMAADIAQTCRT 150 

III I M !| I III II I hi INI II II II I! [I 11111:1111:111111 

orfl22ng NVRCEFGFLCNHGRIDIDHLPTLRLNALIRRTQKDAAVRIFELCGGVGKMAADVAQTCRT 200 

orf 122. pep EQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQ 182 

: I I i I I I I I I I 

orfl22ng EQRVGNGVQQRVGIRMPEQPFFKWDFNSAKYQLSAFGQLVDIVALSDTDIRHRLCS 256 

The complete length ORF122ng nucleotide sequence <SEQ ID 797> is: 



1 ATGTCGTACC 

51 GCCTTTGATT 

101 tgGTCGAACC 

151 ACTGCTTTTT 

201 TTTAtccttt 

251 TTTGCACGtc 

301 ctgcgcctCT 

351 TTTTGCCTTT 

401 ATGTTGGCAC 

451 AATCACGGTC 

501 TTTGATACGC 

551 GCGGCGGTGT 

601 GAGCAGCgcg 

651 CGAGCAGCCC 

7 01 CTGCCTTCGG 

751 CGTCATCGTT 



GGGCAAGCAG 
TTTTTACCGC 
GgtaCCGATG 
CGGCGGCGAT 
gGGAAaccct 
ctggccgccg 
AtgcCTTCCA 
GATatTGACG 
GCATTTGCGG 
GTATCGACAT 
CGCACGCAAA 
CGGGAAAATG 
tcggtaaCGG 
TTTTTCAAAT 
TCAATTGGTG 
TGTGTTCCTG 



TTCGCCGGAT 
TTTTGCCCAA 
CCGATGTATT 
GCGCttgAgt 
atcaAcaAAc 
cgttcaAATc 
TCCGCCCGAG 
CACGAAATAT 
AATGTGCGGT 
TGACCACCTG 
AGGACGCGGC 
GCTGCCGATG 
CGTGCAGCAG 
GGGATTTCAA 
GACATCGTAG 



TTTTTGGAGG 
GGCTTCGATG 
CGTTTTCGGG 
TCgtcttgcg 
agccgccatC 
cgtaccaGca 
ATAGCCGAGT 
CGatacCCAa 
GCGAGTTTGG 
CCAACCCTGC 
TGTCCGCATC 
TCGCCCAAAC 
cgcgTcgGCA 
CTCCGCCAAG 
CCCTGTCCGA 



TTGAAACCGC 
AAGAAATTGa 
TACGAATTCG 
TcgTCATATT 
TTAACATTTT 
ataccgccgc 
TTTTCGTTGG 
atcggcgGCG 
GTTTCTGTGC 
GCCTGAACGC 
TTTGAACTCT 
CTGCCGCACC 
TCCGAATGCC 
TATCAGCTTT 
TACGGATATT 



This encodes a protein having amino acid sequence <SEQ LD 798>: 

1 MSYRASSSPD FLEVETAPLI FLPLLPKASM KKLMVEPVPM PMYSFSGTNS 

51 T AFSAAMRLS SSCWIFL SF GKPYQQTAAI LTFFCTSWPP RSNPYQQYRR 

101 LRLYAFHPPE IAEFFVGFAF DIDARNIDTQ IGGDVGTHLR NVRCEFGFLC 

151 NHGRIDIDHL PTLRLNALIR RTQKDAAVRI FELCGGVGKM AADVAQTCRT 

201 EQRVGNGVQQ RVGIRMPEQP FFKWDFNSAK YQLSAFGQLV DIVALSDTDI 

251 RHRLCS* 
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ORF122ng and ORF122-1 show 92.6% identity in 256 aa overlap: 

10 20 30 40 50 60 

orf 122-1. pep ISYWASSSPDFLEVDTAPLIFLPLLPKASMKKLMVE PVPMPIYSFSGTNSTAFSAAMRLS 

orfl22ng MSYRASSSPDFLEVETAPLIFLPLLPKASMKKLMVEPVPMPMYSFSGTNSTAFSAAMRLS 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf 122-1. pep SSCWIFLSFGKPYQQTAAILTFFCTSCPPRSNAYQQYRRLRLYAFHPPEIAEFFVGFAF 

orfl22ng SSCVVIFLSFGKPYQQTAAILTFFCTSWPPRSNPYQQYRRLRLYAFHPPEIAEFFVGFAF 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf 122-1. pep DVDARNVYAQIGGDVGTHLRNVRRE FGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRI 

orfl22ng DIDARNIDTQIGGDVGTHLRNVRCE FGFLCNHGRIDIDHLPTLRLNALIRRTQKDAAVRI 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 122-1 . pep FELCGGVGEMAADIAQTCRTEQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLV 
I I I I I I I I : I I I I : I I I I I I I I I I I I I I I I I : I I : I I I I I I I I I I I I I I I I I I I I I I I 
orfl22ng FE LCGGVGKMAADVAQTCRTEQRVGNGVQQRVG I RMPEQP FFKWDFN S AKYQLS AFGQLV 

190 200 210 220 230 240 

250 

orf 122-1. pep DIVALSDTDVRHRLCSX 
I I I I I I I I I : I II I I I I 
orfl22ng DIVALSDTDIRHRLCSX 
250 

Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 95 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 799>: 

1 . . GCCGGCGCGA GTGCGAACAA CATTTCCGCG CGTTTTGCGG AAACACCCGT 

51 CGCTGTCAGC GTTACCCTGA TCGGCACGGT ACTTGCCGTC ATGCTGCCCG 

101 TTACCGAATA TGAAAACTTC CTGCTGCTTA TCGGCTCGGT ATTTGCGCCG 

151 ATGSGGCGGA JTTTGATTGC CGACTTTTTC GTCTTGAAAC GGCGTGA 

This corresponds to the amino acid sequence <SEQ ID 800; ORF125>: 



1 . . AGAS ANN I SA RFAETPVAVS VTLIGTVLAV MLPVTEYENF LLLIGSVFAP 
51 MGGFDCRLFR LETA* 

Further work revealed the complete nucleotide sequence <SEQ ID 80 1>: 

1 ATGTCGGGCA ATGCCTCCTC TCCTTCATCT TCCTCCGCCA TCGGGCTGAT 

51 TTGGTTCGGC GCGGCGGTAT CGATTGCCGA AATCAGCACG GGTACGCTGC 

101 TTGCGCCTTT GGGCTGGCAG CGCGGTCTGG CGGCTCTACT TTTGGGTCAT 

151 GCCGTCGGCG GCGCGCTGTT TTTTGCGGCG GCGTATATCG GCGCACTGAC 

201 CGGACGCAGC TCGATGGAAA GCGTGCGCCT GTCGTTCGGC AAACGCGGTT 

251 CAGTGCTGTT TTCCGTGGCG AATATGCTGC AACTGGCCGG CTGGACGGCG 

301 GTGATGATTT ACGCCGGCGC AACGGTCAGC TCCGCTTTGG GCAAAGTGTT 

351 GTGGGACGGC GAATCTTTTG TCTGGTGGGC ATTGGCAAAC GGCGCGCTGA 

401 TTGTGCTGTG GCTGGTTTTC GGCGCACGCA AAACAGGCGG GCTGAAAACC 

451 GTTTCGATGC TGCTGATGCT GTTGGCGGTT CTGTGGCTGA GTGCCGAAGT 

501 CTTTTCCACG GCAGGCAGCA CCGCCGCACA GGTTTCAGAC GGCATGAGTT 

551 TCGGAACGGC AGTCGAGCTG TCCGCCGTGA TGCCGCTTTC CTGGCTGCCG 

601 CTTGCCGCCG ACTACACGCG CCACGCGCGC CGCCCGTTTG CGGCAACCCT 

651 GACGGCAACG CTCGCCTACA CGCTGACCGG CTGCTGGATG TATGCCTTGG 

701 GTTTGGCAGC GGCGTTGTTC ACCGGAGAAA CCGACGTGGC AAAAATCCTG 

751 CTGGGCGCAG GTTTGGGTGC GGCAGGCATT TTGGCGGTCG TCCTCTCCAC 
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801 CGTTACCACA ACGTTTCTCG ATGCCTATTC CGCCGGCGCG AGTGCGAACA 

851 ACATTTCCGC GCGTTTTGCG GAAACACCCG TCGCTGTCGG CGTTACCCTG 

901 ATCGGCACGG TACTTGCCGT CATGCTGCCC GTTACCGAAT ATGAAAACTT 

951 CCTGCTGCTT ATCGGCTCGG TATTTGCGCC GATGGCGGCG GTTTTGATTG 

1001 CCGACTTTTT CGTCTTGAAA CGGCGTGAGG AGATTGAAGG CTTTGACTTT 

1051 GCCGGACTGG TTCTGTGGCT TGCGGGCTTC ATCCTCTACC GCTTCCTGCT 

1101 CTCGTCCGGC TGGGAAAGCA GCATCGGTCT GACCGCCCCC GTAATGTCTG 

1151 CCGTTGCCAT TGCCACCGTA TCGGTACGCC TTTTCTTTAA AAAAACCCAA 

1201 TCTTTACAAA GGAACCCGTC ATGA 

This corresponds to the amino acid sequence <SEQ ID 802; ORF125-l>: 

1 MSGNASSPSS SSAIGLIWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 
51 AVGG ALFFAA AYIGALTGRS SMESVRLSFG KRGSVLFSVA NMLQLAGWTA 
101 VMIYAGATVS SALGKVLWDG ES FVWWALAN GALIVLWLV F GARKTGGLKT 
151 VS MLLMLLAV LWLSAEVF ST AGSTAAQVSD GMSFGTAVEL SAVMPLSWLP 
201 LAADYTRHAR RPFAATLTAT LAYTLTGCWM YALGLAAALF TGETDVAKIL 
251 LGAGLGAAGI LAWL STVTT TFLDAYSAGA SANNISARFA ETPVAVGVTL 
301 IGTVLAVM LP VTEYEN FLLL IGSVFAPMAA VLI ADFFVLK RREEIEGFDF 
351 AGLVLWLAGF ILYRFLL SSG WESSIGLTA P VMSAVAIATV SVRLFF KKTQ 
4 01 SLQRNPS* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF125 shows 76.5% identity over a 51aa overlap with an ORF (ORF125a) from strain A of N. 
meningitidis: 

10 20 30 

orf 125 . pep AGASANN I S AR FAE T PVAV S VT L I GT VLAV 



orf 125 . pep MLPVTEYENFLLLIGSVFAPMGGFDCRLFRLETAX 

: I I I I I I I I I: 

orf 125a LLPVTEYENFLLLIGSVFAPMAAVLIADFFVLKRREEIEG 

310 320 330 340 

The ORF 125a partial nucleotide sequence <SEQ ID 803> is: 



1 ATGTCGGGCA ATGCCTCCTC TCNTTCATCT TCCGCCGCCA TCGGGCTGAT 

51 TTGGTTCGGC GCGGCGGTAT CGATTGCCGA AATCAGCACG GGTACACTGC 

101 TTGCGCCTTT GGGCTGGCAG CGCGGTCTGG CNGCTCTGCT TTTGGGTCAT 

151 GCCGTCGGCG GCGCGCTGTT TTTTGCGGCG GCGTATATCG GCGCACTGAC 

201 CGGACNCANC TCGATGGAAA GCGTGCGCCT GTCGTTCGGC AAACGCGGTT 

251 CAGTGCTGTT TTCCGTGGCG AATATGCTGC AACTGGCCGG CTGGACGGCG 

301 GTGATGATTT ACGCCGGCGC AACGGTCAGC TCCGCTTTGG GCAAAGTGTT 

351 GTGGGACGGC GAATCTTTTG TCTGGTGGGC ATTGGCAAAC GGCGCGCTGA 

4 01 TTGTGCTGTG GCTGGTTTTC GGCGCACGCA AAACAGGCGG GCTGAAAACC 

4 51 GTTTCGATGC TGCTGATGCT GTTGGCGGTT CTGTGGCTGA GTGCCGAANT 

501 NTTTTCCACG GCAGGCAGCA CCGCCGCANN GGTNNCAGAC GGCATGAGTT 

551 TCGGAACGGC AGTCGAGCTG TCCGCCGTNA TGCCGCTTTC TTGGCTGCCG 

601 CTGGCCGCCG ACTACACGCG CCACGCGCGC CGCCCGTTTG CGGCAACCCT 

651 GACGGCAACG CTCGCCTACA CGCTGACCGG CTGCTGGATG TATGCCTTGG 

701 GTTTGGCAGC GGCGTTGTTC ACCGGAGAAA CCGACGTGGC AAAAATCCTG 

751 CTGGGCGCAG GTTTGGGTGC GGCAGGCATT TTGGCGGTCG TCCTGTCGAC 

801 CGTTACCACC ACTTTTCTCG ATGCNTACTC CGCCGGCGTA AGTGCCAACA 

851 ATATTTCCGC CAAACTTTCG GAAATACCNA TCGCCGTTGC CGTCGCCGTT 

901 GTCGGCACAC TGCTTGCCGT CCTCCTGCCC GTTACCGAAT ATGAAAACTT 

951 CCTGCTGCTT ATCGGCTCGG TATTTGCGCC GATGGCGGCG GTTTTGATTG 

1001 CCGACTTTTT CGTCTTGAAA CGGCGTGAGG AGATTGAAGG C. 

This encodes a protein having the partial amino acid sequence <SEQ ID 804>: 



1 MSGNASSXSS SAAIGLIWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 



51 AVGG ALFFAA AYIGALTGXX SKESVRLSFG KRGSVLFSVA NMLQLAGWTA 
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101 VMIYAGATVS SALGKVLWDG ES FVWWALAN GALIVLWLV F GARKTGGLKT 

151 VS MLLMLLAV LWLSAEXF ST AGSTAAXVXD GMSFGTAVEL SAVMPLSWLP 

201 LAADYTRHAR RPFAATLTAT LAYTLTGCWM YALGLAAALF TGETDVAKIL 

251 LGAGLGAAGI LAVVL STVTT TFLDAYSAGV SANNI5AKLS E IPIAVAVAV 

301 VGTLLAVL LP VTEYEN FLLL IGSVFAPMAA VLI ADFFVLK RREEIEG ■ ■ 

ORF125a and ORF125-1 show 94.5% identity in 347 aa overlap: 



MSGNASSXSSSAAIGLIWFGAAVSIAEISTGTLLAPLGWQRGLAALLLGHAVGGALFFAA 
I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
MSGNASSPSSSSAIGLIWFGAAVSIAEISTGTLLAPLGWQRGLAALLLGHAVGGALFFAA 



AYIGALTGXXSMESVRLSFGKRGSVLFSVANMLQLAGWTAVMIYAGATVSSALGKVLWDG 
I I I I I I I I I I I I I I I I I I I I I ! I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
AYIGALTGRSSMESVRLSFGKRGSVLFSVANMLQLAGWTAVMIYAGATVSSALGKVLWDG 
70 80 90 100 110 120 



ESFVWWALANGALIVLWLVFGARKTGGLKTVSMLLMLLAVLWLSAEXFSTAGSTAAXVXD 

I I I I I I I I I I I I I I I I I I Ill I II I I 

ESFVWWALANGALIVLWLVFGARKTGGLKTVSMLLMLLAVLWLSAEVFSTAGSTAAQVSD 
130 140 150 160 170 180 

190 200 210 220 230 240 

GMSFGTAVELSAVMPLSWLPLAADYTRHARRPFAATLTATLAYTLTGCWMYALGLAAALF 

GMSFGTAVELSAVMPLSWLPLAADYTRHARRPFAATLTATLAYTLTGCWMYALGLAAALF 
190 200 210 220 230 240 



orf 125a. pep VGTLLAVLLPVTEYENFLLLIGSVFAPMAAVLIADFFVLKRREEIEG 
: I I : I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 125-1 IGTVLAVMLPVTEYENFLLLIGSVFAPMAAVLIADFFVLKRREEIEGFDFAGLVLWLAGF 
310 320 330 340 350 360 

Homology with a predicted ORF from N. gonorrhoeae 

ORF125 shows 86.2% identity over a 65aa overlap with a predicted ORF (ORF125ng) from 
N. gonorrhoeae: 

orf 125. pep AGASANNI SARFAETPVAVSVTLIGTVLAV 30 

orfl25ng KILLGAGLGITGILAWLSTVTTTFLDTYSAGASANNISARFAEIPVAVGVTLIRTVLAV 308 

orf 125. pep MLPVTEYENFLLLIGSVFAPM-GGFDCRLFRLETA 64 

I I I I I I I : I I I I I I 111:11 I I I I I I I I I : I I 
orfl25ng MLPVTEYKNFLLLIRSVFGPMAGGFDCRLFCLKTA 343 

An ORF125ng nucleotide sequence <SEQ ID 805> was predicted to encode a protein having amino 
acid sequence <SEQ ID 806>: 



MSGNASSPSS SAAIGLVWFG 
AVGG ALFFAA AYIGALTGRS 
VMIYVGATVS SALGKVLWDG 
VSMLLMLLAV LWLSVEVFAS 
PLAADYTRQA RRPFAATLTA 
LLGAGLGITG ILAWL STVT 
LIRTVLAVML PVTEYKNFLL 



AAVSIAEIST 
SMESVRLSFG 
ES FVWWALAN 
SGTKAAPAV3 
TLAYTLTGCW 
TTFLDTYSAG 
LIRSVFGPMA 



GTLLAPLGWQ 
KCGSVLFSVA 
GALIVLWLV F 
DGMTFGTAVE 
MYALGLAAAL 
ASANNISARF 
GGFDCRLFCL 



RGLAALLLGH 
NMLQLAGWTA 
GARRTGGLKT 
LSAVMPLSWL 
FTGETDVAKI 
AEIPVAVGVT 
KTA* 
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Further work revealed the following gonococcal DNA sequence <SEQ ID 807>: 

1 ATGTCGGGCA ATGCCTCCTC TCCTTCATCT TCCGCCGCCA TCGGGCTGGT 

51 TTGGTTCGGC GCGGCGGTAT CGATTGCCGA AATCAGCACG GGTACGCTGC 

101 TCGCCCCCTT GGGCTGGCAG CGCGGTCTGG CGGCCCTGCT TTTGGGTCAT 

151 GCCGTCGGCG GCGCGCTGTT TTTTGCGGCG GCGTATATCG GCGCACTGAC 

201 CGGACGCAGC TCGATGGAAA GTGTGCGCCT GTCGTTCGGC AAATGCGGTT 

251 CAGTGCTGTT TTCCGTGGCG AATATGCTGC AACTGGCCGG CTGGACGGCG 

301 GTGATGATTT ACGTCGGCGC AACGGTCAGC TCCGCTTTGG GCAAAGTGTT 

351 GTGGGACGGC GAATCCTTTG TCTGGTGGGC ATTGGCAAAC GGCGCACTGA 

4 01 TCGTGCTGTG GCTGGTTTTC GGCGCACGCA GAACGGGCGG GCTGAAAACC 

451 GTTTCGATGC TGCTGATGCT GCTTGCCGTG TTGTGGTTGA GCGTCGAAGT 

501 GTTCGCTTCG TCCGGCACAA ACGCCGCGCC CGCCGTTTCA GACGGCATGA 

551 CCTTCGGAAC GGCAGTCGAA CTGTCCGCCG TCATGCCGCT TTCCTGGCTG 

601 CCGCTGGCCG CCGACTACAC GCGCCAAGCA CGCCGCCCGT TTGCGGCAAC 

651 CCTGACGGCA ACGCTCGCCT ATACGCTGAC GGGCTGCTGG ATGTATGCCT 

701 TGGGTTTGGC GGCGGCTCTG TTTACCGGAG AAACCGACGT GGCGAAAATC 

751 CTGTTGGGCG CGGGCTTGGG CATAACGGGC ATTCTGGCAG TCGTCCTCTC 

801 CACCGTTACC ACAACGTTTC TCGATACCTA TTCCGCCGGC GCGAGTGCGA 

851 ACAACATTTC CGCGCGTTTT GCGGAAATAC CCGTCGCTGT CGGCGTTACC 

901 CTGATCGGCA CGGTGCTTGC CGTCATGCTG CCCGTTACCG AATATAAAAA 

951 CTTCCTGCTG CTTATCGGCT CGGTATTTGC GCCGATGGCG GCGGTTTTGA 

1001 TTGCCGACTT TTTCGTCTTA AAACG3CGTG AGGAGATTGA AGGCTTTGAC 

1051 TTTGCCGGAC TGGTTCTGTG GCTGGCAGGC TTCATCCTCT ACCGCTTCCT 

1101 GCTCTCGTCC GGTTGGGAAA GCAGCATCGG TCTGACCGCC CCCGTAATGT 

1151 CTGCCGTTGC CATTGCCACC GTATCGGTAC GCCTTTTCTT TAAAAAAACC 

1201 CAATCTTTAC AAAGGAACCC GTCATGA 

This corresponds to the amino acid sequence <SEQ ID 808; ORF125ng-l>: 

1 MSGNASSPSS SAAIGLVWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 

51 AVGG ALFFAA AYIGALTGRS SMESVRLSFG KCGSVLFSVA NMLQLAGWTA 

101 VMIYVGATVS SALGKVLWDG ES FVWWALAN GALIVLWLV F GARRTGGLKT 

151 VS MLLMLLAV LWLSVEVFA S SGTNAAPAVS DGMT FGTAVE LSAVMPLSWL 

201 PLAADYTRQA RREFAATLTA TLAYTLTGCW MYALGLAAAL FTGETDVAKI 

251 LLGAGLGITG ILAVVL STVT TTFLDTYSAG ASANNISARF AEIPVAVGVT 

301 LIGTVLAVM L PVTEYKN FLL LIGSVFAPMA AVLI ADFFVL KRREEIEGFD 

351 F AGLVLWLAG FILYR FLLSS GWESSIGLTA PVMSAVAIAT VSVRLFFKKT 

401 QSLQRNPS* 

ORF 1 25ng- 1 and ORF 125-1 show 95 . 1 % identity in 408 aa overlap : 

10 20 30 40 50 60 

orfl25-l.pep MSGNASSPSSSSAIGLIWFGAAVSIAEISTGTLLAPLGWQRGLAALLLGHAVGGALFFAA 

orfl25ng-l MSGNAS S PSS SAAIGLVWFGAAVS I AEI STGTLLAPLGWQRGLAALLLGHAVGGALFFAA 

10 20 30 40 50 60 

70 80 90 100 110 120 

AYIGALTGRS SMESVRLSFGKRGSVLFSVANMLQLAGWTAVMIYAGATVS SALGKVLWDG 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I MM: 

AYIGALTGRS SMESVRLSFGKCGSV1FSVANMLQLAGWTAVMIYVGATVS SALGKVLWDG 
70 80 90 100 110 120 

130 140 150 160 170 179 

ESFVWWALANGALIVLWLVFGARKTGGLKTVSMLLMLLAVLWLSAEVFSTAGSTAAQ-VS 

ESFVWWALANGALIVLWLVFGARRTGGLKTVSMLLMLLAVLWLSVEVFASSGTNAAPAVS 
130 140 150 160 170 180 

180 190 200 210 220 230 239 

DGMSFGTAVELSAVMPLSWLPLAADYTRHARRPFAATLTATLAYTLTGCWMYALGLAAAL 

DGMT FGTAVELSAVMPLSWLPLAADYTRQARRPFAATLTATLAYTLTGCWMYALGLAAAL 
190 200 210 220 230 240 

240 250 260 270 280 290 299 

or f 125-1. pep FTGETDVAKI LLGAGLGAAGILAWLSTVTTTFLDAYSAGASANNISARFAETPVAVGVT 

M M M M M M M M I Hllllllllllllllhllllllllllllllll M M M I 
orfl25ng-l FTGETDVAKILLGAGLGITGILAWLSTVTTTFLDTYSAGASANNISARFAEIPVAVGVT 
250 260 270 280 290 300 



orfl25-l.pep 
orf 125ng-l 



orf 125-1 .pep 
orfl25ng-l 



orf 125-1. pep 
orf!25ng-l 
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300 310 320 330 340 350 359 

orf 125-1. pep LIGTVLAVMLPVTEYENFLLLIGSVFAPMAAVLIADFFVLKRREEIEGFDFAGLVLWLAG 

I I I I I I I I I I I I I I I : I I I I I I I I I I 

orfl25ng-l LIGTVLAVMLPVTEYKNFLLLIGSVFAPMAAVLIADFFVLKRREEIEGFDFAGLVLWLAG 
310 320 330 340 350 360 

360 370 380 390 400 

orf 125-1 .pep F1LYRFLLSSGWESSIGLTAPVMSAVAIATVSVRLFFKKTQSLQRNPSX 

I I I I! I I M I I I I I I I I I I I I I I I I 

orfl25ng-l FILYRFLLSSGWESSIGLTAPVMSAVAIATVSVRLFFKKTQSLQRNPSX 
370 380 390 400 

Based on this analysis, including the presence of putative leader sequence and transmembrane 
domains in the gonococcal protein, it is predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 96 

The following partial DNA sequence was identified in JV. meningitidis <SEQ ID 809>: 

1 ATGACCCGTA TCGCCATCCT CGGCGGCGGC CTCTCGGGAA GGCTGACCGC 

51 GTTGCAGCTT GCAGAACAAG GTTATCAGAT TGCACTTTTC GATAAAAGCT 

101 GCCGCCGGGG CGAACACGCC GCCGCCTATG TAGCCGCCGC CATGCTCGCG 

151 CCTGCAGCGG A. ACGGTCGA AGCCACGCCC GAAGTGGTCA GGCTGGGCAG 

201 GCAGAGCATC CCGCTTTGGC GCGGCATCCG ATGCCGTCTG AACACGCACA 

251 CGATGATGCA GGAAAACGGC AGCCTGATTG TATGGCACGG GCAGGACAAG 

301 CCATTATCCA GCGAGTTCGT CCGCCATCTC AAACGCGGCG GCGT.ACGGA 

351 TGACGAAATC GTCCGTTGGC GCGCCGACGA CATCGCCGAA CGCGAACCGC 

401 AACTCGGCGG ACGTTTTTAA GACGGCATCT ACCTGCCGAC CGAAGC.CAG 

451 CTCGACGGGC GGCAATTATA GTCTGCACTT GCCGACGCTT TGGACGAACT 

501 GAACGTCCCC TGCCATTGGG AACACGAATG CGTCCCCGAA GCCTGCAAG . . 

This corresponds to the amino acid sequence <SEQ ID 810; ORF126>: 

1 MTRIAILGGG LSGRLTALQL AEQGYQIALF DKSCRRGEHA AAYVAAAMLA 

51 PAAXTVEATP EWRLGRQSI PLWRGIRCRL NTHTMMQENG SLIVWHGQDK 

101 PLSSEFVRHL KRGGXTDDEI VRWRADDIAE REPQLGGRFX DGI YLPTEXQ 

151 LDGRQLXSAL ADALDELNVP CHWEHECVPE ACK. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 81 1>: 

1 ATGACCCGTA TCGCCATCCT CGGCGGCGGC CTCTCGGGAA GGCTGACCGC 

51 GTTGCAGCTT GCAGAACAAG GTTATCAGAT TGCACTTTTC GATAAAGGCT 

101 GCCGCCGGGG CGAACACGCC GCCGCCTATG TTGCCGCCGC CATGCTCGCG 

151 CCTGCGGCGG AAGCGGTCGA AGCCACGCCC GAAGTGGTCA GGCTGGGCAG 

201 GCAGAGCATC CCGCTTTGGC GCGGCATCCG ATGCCGTCTG AACACGCACA 

251 CGATGATGCA GGAAAACGGC AGCCTGATTG TGTGGCACGG GCAGGACAAG 

301 CCATTATCCA GCGAGTTCGT CCGCCATCTC AAACGCGGCG GCGTAGCGGA 

351 TGACGAAATC GTCCGTTGGC GCGCCGACGA CATCGCCGAA CGCGAACCGC 

4 01 AACTCGGCGG ACGTTTTTCA GACGGCATCT ACCTGCCGAC CGAAGGCCAG 

4 51 CTCGACGGGC GGCAAATATT GTCTGCACTT GCCGACGCTT TGGACGAACT 

501 GAACGTCCCC TGCCATTGGG AACACGAATG CGTCCCCGAA GGCCTGCAAG 

551 CCCAATACGA CTGGCTGATC GACTGCCGCG GCTACGGCGC AAAAACCGCG 

601 TGGAACCAAT CCCCCGAGCA CACCAGCACC CTGCGCGGCA TACGCGGCGA 

651 AGTGGCGCGG GTTTACACAC CCGAAATCAC GCTCAACCGC CCCGTGCGTC 

7 01 TGCTCCATCC GCGTTATCCG CTCTACATCG CCCCGAAAGA AAACCACGTC 

7 51 TTCGTCATCG GCGCGACCCA AATCGAAAGC GAAAGCCAAG CCCCCGCCAG 
801 CGTGCGTTCA GGGTTGGAAC TCTTGTCCGC ACTCTATGCC ATCCACCCCG 

8 51 CCTTCGGCGA AGCCGACATC CTCGAAATCG CCACCGGCCT GCGCCCCACG 
901 CTCAACCACC ACAACCCCGA AATCCGTTAC AACCGCGCCC GACGCCTGAT 
951 TGAAATCAAC GGCCTTTTCC GCCACGGTTT CATGATCTCC CCCGCCGTAA 

1001 CCGCCGCCGC CGCCAGATTG GCAGTGGCAC TGTTTGACGG AAAAGACGCG 

1051 CCCGAACGCG ATAAAGAAAG CGGTTTGGCG TATATCCGAA GACAAGATTA 

1101 A 
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This corresponds to the amino acid sequence <SEQ ID 812; ORF126-l>: 

1 MTRIAILGGG LSGRLTALQL AEQGYQIALF DKGCRRGEHA AAYVAAAMLA 

51 PAAEAVEATP EWRLGRQSI PLWRGIRCRL NTHTMMQENG SLIVWHGQDK 

101 PLSSEFVRHL KRGGVADDEI VRWRADDIAE REPQLGGRFS DGIYLPTEGQ 

151 LDGRQILSAL ADALDELNVP CHWEHSCVPE GLQAQYDWLI DCRGYGAKTA 

201 WNQS PEHTST LRGIRGEVAR VYTPEITLNR PVRLLHPRYP LYIAPKENHV 

251 FVIGATQIES ESQAPASVRS GLELLSALYA IHPAFGEADI LEIATGLRPT 

301 LNHHNPEIRY NRARRLIEIN GLFRHGFM IS PAVTAAAARL AVALF DGKDA 

351 PERDKESGLA YIRRQD* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF126 shows 90.0% identity over a 180aa overlap with an ORF (ORF126a) from strain A of TV. 
meningitidis: 

10 20 30 40 50 60 

orf 12 6 . pep MTRIAILGGGLSGRLTALQLAEQGYQIALFDKSCRRGEHAAAYVAAAMLAPAAXTVEATP 

I I I I I I I I I I I I I f 1 I I I I ! I I ! I I I : I I I I I I I I I I I I I I I I I I I I : 

orf 12 6a MTRIAILGGGLSGRLTALQLAEQGYQIALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 
10 20 30 40 50 60 

70 80 90 100 110 120 

Orf 12 6. pep EWRLGRQSIPLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGXTDDEI 

llllllll 111111111:1:1 :ll MINI :IHI!IIIII :|| I 

orf 126a EVVRLGRQXIPLWRGIRCHLKTPAMMXENGSLIVWHGQDKPLSNEFVRHLKRGGVADDXI 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf 12 6. pep VRWRADDIAEREPQLGGRFXDGI YLPTEXQLDGRQLXSALADALDELNVPCHWEHECVPE 

I I I I I I I I I I I I i I II I I I llllllll IMIII: I II I I I I I I I I I I I I I I I I I : I I 
O r f 1 2 6a VRWRADDIAERE PQLGGRFSDGI YLPTEGQLDGRQILSALADALDELNVPCHWEHECAPE 

130 140 150 160 170 180 

The complete length ORF 126a nucleotide sequence <SEQ ID 813> is: 

1 ATGACCCGTA TCGCCATCCT CGGCGGCGGC CTCTCNGGAA GGCTGACCGC 

51 ACTGCAGCTT GCAGAACAAG GTTATCAGAT TGCACTTTTC GATAAAGGCT 

101 GCCGCCGGGG CGAACACGCC GCCGCCTATG TTGCCGCCGC CATGCTCGCG 

151 CCTGCGGCGG AAGCGGTCGA AGCCACGCCT GAAGTGGTCA GGCTGGGCAG 

201 GCAGANCATC CCGCTTTGGC GCGGCATCCG ATGCCATCTG AAAACGCCTG 

251 CCATGATGCA NGAAAACGGC AGCCTGATTG TGTGGCACGG GCAGGACAAA 

301 CCTTTATCCA ACGAGTTCGT CCGCCATCTC AAACGCGGCG GCGTAGCGGA 

351 TGACNAAATC GTCCGTTGGC GCGCCGACGA CATCGCCGAA CGCGAACCGC 

401 AACTCGGCGG ACGTTTTTCA GACGGCATCT ACCTGCCGAC CGAAGGCCAG 

451 CTCGACGGGC GGCAAATATT GTCTGCACTT GCCGACGCTT TGGACGAACT 

501 GAACGTCCCC TGCCATTGGG AACACGAATG TGCCCCCGAA GACTTGCAAG 

551 CCCAATACGA CTGGCTGATC GACTGCCGCG GCTACGGCGC AAAAACCGCG 

601 TGGAACCAAT CCCCCGANNA NACCAGCACC CTGCGCGGCA TACGCGGCGA 

651 AGTGGCGCGG GTTTACACAC CCGAAATCAC GCTCAACCGC CCCGTGCGCC 

701 TGCTACACCC GCGCTATCCG CTNTACATCG CCCCGAAAGA AAACCNCGTC 

751 TTCGTCATCG GCGCGACCCA AATCGAAAGC GAAAGCCAAG CACCTGCCAG 

801 CGTGCGTTCC GGGCTGGAAC TCTTATCCGC ACTCTATGCC GTCCACCCCG 

851 CCTTCGGCGA AGCCGACATC CTCGAAATCG CCACCGGCCT GCGCCCCACG 

901 CTCAATCACC ACAACCCCGA AATCCGTTAC AACCGCGCCC GACGCCTGAT 

951 TGAAATCAAC GGCCTTTTCC GCCACGGTTT CATGATCTCC CCCGCCGTAA 

1001 CCGCCGCCGC CGTCAGATTG GCAGTGGCAC TGTTTGACGG AAAAGANGCG 

1051 CCCGAACGCG ATGAAGAAAG CGGTTTGGCG TATATCCGAA GACAAGATTA 

1101 A 

This encodes a protein having amino acid sequence <SEQ ID 814>: 

1 MTRIAILGGG LSGRLTALQL AEQGYQIALF DKGCRRGEHA AAYVAAAMLA 

51 PAAEAVEATP EWRLGRQXI PLWRGIRCHL KTPAMMXENG SLIVWHGQDK 

101 PLSNEFVRHL KRGGVADDXI VRWRADDIAE REPQLGGRFS DGIYLPTEGQ 

151 LDGRQILSAL ADALDELNVP CHWEHECAPE DLQAQYDWLI DCRGYGAKTA 

201 WNQSPXXTST LRGIRGEVAR VYTPEITLNR PVRLLHPRYP LYIAPKENXV 
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FVIGATQIES ESQAPASVRS GLELLSALYA VHPAFGEADI LEIATGLRPT 
LNHHNPEIRY NRARRLIEIN GLFRHGFM IS PAVTAAAVRL AVAL F DGKXA 
PERDEESGLA YIRRQD* 



ORF126a and ORF126-1 show 95.4% identity in 366 aa overlap: 



or f 12 6a . pep MTRIAILGGGLSGRLTALQLAEQGYQIALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 
orfl2 6-l MTRIAILGGGLSGRLTALQLAEQGYQIALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 



EWRLGRQXIPLWRGIRCHLKT PAMMXENGSLIVWHGQDKPLSNEFVRHLKRGGVADDXI 

EVVRLGRQSIPLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEI 
70 80 90 100 110 120 



130 140 150 160 170 180 

or f 12 6a . pep VRWRADDIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPE 

20 orf 12 6-1 VRWRADDIAERE PQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECVPE 

130 140 150 160 170 180 



190 200 210 220 230 240 

^ orf 12 6a . pep DLQAQYDWLIDCRGYGAKTAWNQSPXXTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYP 

orf 12 6-1 GLQAQYDWLIDCRGYGAKTAWNQSPEHTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYP 
190 200 210 220 230 240 



250 260 270 280 290 300 

30 orf 12 6a. pep LYIAPKENXVFVIGATQIESESQAPASVRSGLELLSALYAVHPAFGEADILEIATGLRPT 

I I I I ! I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I It : I I I I I I I I I II I M I I I I I 
orf 12 6-1 LYIAPKENHVFVIGATQIESESQAPASVRSGLEIJ.SALYAIHPAFGEADI LEIATGLRPT 

250 260 270 280 290 300 

35 310 320 330 340 350 360 

orf 12 6a. pep LNHHNPEIRYNRARRLIEINGLFRHGFMISPAVTAAAVRLAVALFDGKXAPERDEESGLA 

orf 12 6-1 LNHHNPEIRYNRARRLIEINGLFRHGFMISPAVTAAAARLAVALFDGKDAPERDKESGLA 
310 320 330 340 350 360 

40 



orf 12 6a. pep YIRRQDX 
I I I I I I I 

orf 12 6-1 YIRRQDX 

45 

Homology with a predicted ORF from N. gonorrhoeae 

ORF126 shows 90% identity over a 180 aa overlap with a predicted ORF (ORF126ng) from 
N. gonorrhoeae: 



or f 12 6 . pep MTRIAILGGGLSGRLTALQLAEQGYQIALFDKSCRRGEHAAAYVAAAMLAPAAXTVEATP 60 

: I I I I I Mil: hllllllllll : I I I I I 

orfl26ng MTRIAVLGGGLSGRLTALQLAEQGYQIELFDKGTRQGEHAAAYVAAAMLAPAAEAVEATP 60 

orf 12 6. pep EWRLGRQSIPLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGXTDDEI 120 

I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I II I : I I I I 

orfl2 6ng EVI RLGRQS I PLWRGI RCRLNTLTMMQENG S L I VWHGQDKPL S SE FVRHLKRGGVADDE I 120 

orf 12 6 .pep VRWRADDIAERE PQLGGRFXDGIYLPTEXQLDGRQLX3ALADALDELNVPCHWEHECVPE 180 

orfl2 6ng VRWRADEIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPQ 180 

An ORF126ng nucleotide sequence <SEQ ID 815> was predicted to encode a protein having amino 
acid sequence <SEQ ID 816>: 



1 MTRIAVLGGG LSGRLTALQL AEQGYQIELF DKGTRQGEHA AAYVAAAMLA 
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51 PAAEAVEATP EVIRLGRQSI PLVJR3IRCRL NTLTMMQENG SLIVWHGQDK 

101 PLSSEFVRHL KRGGVADDEI VRWRADEIAE REPQLGGRFS DGIYLPTEGQ 

151 LDGRQILSAL ADALDELNVP CHWEHECAPQ DLQAQYDWVI DCRGYGAKTA 

201 WNQ5PEHT5T LRGIRGEVRG FTRPKSRSTA PCACCTRAIR STSPRKKTTS 

251 SSSARPKSKA KAKPPPAYVP GWNSYPRSMP STPPSAKPTS SKWRPGLRPT 

301 LNHHNPEIRY SRERRLIEIM GLFRHGFM IS PAVTAAAVRL AVAL F DGKDA 

351 PERDEESGLA YIGRQD* " ~" 

Further work revealed the following gonococcal DNA sequence <SEQ ID 817>: 

1 ATGACCCGTA TCGCCGTCCT CGGAGGCGGC CTTTCCGGAA GGCTGACCGC 

51 ATTGCAGCTT GCAGAACAAG GTTATCAGAT TGAACTTTTC GACAAGGGCA 

101 CCCGCCAAGG CGAACACGCC GCCGCCTATG TTGCCGCCGC GATGCTCGCG 

151 CCTGCGGCGG AAGCGGTCGA GGCAACGCCC GAAGTCATCA GGCTGGGCAG 

2 01 GCAGAGCATT CCGCTTTGGC GCGGCATCCG ATGCCGTCTG AACACGCTCA 

251 CGATGATGCA GGAAAACGGC AGCCTGATTG TGTGGCACGG GCAGGACAAG 

301 CCATTATCCA GCGAGTTCGT CCGCCATCTC AAACGCGGCG GCGTAGCGGA 

351 TGACGAAATC GTCCGTTGGC GCGCCGATGA AATCGCCGAA CGCGAACCGC 

4 01 AACTCGGCGG ACGTTTTTCA GACGGCATCT ACCTGCCGAC CGAAGGCCAG 

4 51 CTCGACGGGC GGCAAATATT GTCTGCACTT GCCGACGCTT TGGACGAACT 

501 GAACGTCCCT TGCCATTGGG AACACGAATG CGCCCCCCAA GACCTGCAAG 

551 CCCAATACGA CTGGGTAATC GACTGCCGGG GCTACGGCGC GAAAACCGCG 

601 TGGAACCAAT CCCCCGAGCA CACCAGCACC TTGCGCGGCA TACGCGGCGA 

651 AGTGGCGCGG GTTTACACGC CCGAAATCAC GCTCAACCGC CCCGTGCGCC 

701 TGCTGCACCC GCGCTATCCG CTCTACATCG CCCCGAAAGA AAACCACGTC 

7 51 TTCGTCATCG GCGCGACCCA AATCGAAAGC GAAAGCCAAG CCCCCGCCAG 

801 CGTACGTTCC GGGCTGGAAC TCTTATCCGC GCTCTATGCC GTCCACCCCG 

851 CCTTCGGCGA AGCCGACATC CTCGAAATCG CCGCCGGCCT GCGCCCCACG 

901 CTCAACCACC ACAACCCCGA AATCCGCTAC AGCCGCGAAC GCCGCCTCAT 

951 CGAAATCAAC GGCCTTTTCC GGCACGGCTT TATGATTTCC CCCGCCGTAA 

1001 CCGCCGCCGC CGTCAGATTG GCAGTGGCAC TGTTTGACGG AAAAGACGCG 

1051 CCCGAACGTG ATGAAGAAAG CGGTTTGGCG TATATCGGAA GACAAGATTA 

1101 A 

This corresponds to the amino acid sequence <SEQ ID 818; ORF126ng-l>: 

1 MTRIAVLGGG LSGRLTALQL AEQGYQIELF DKGTRQGEHA AAYVAAAMLA 

51 PAAEAVEATP EVIRLGRQSI PLWRGIRCRL NTLTMMQENG SLIVWHGQDK 

101 PLSSEFVRHL KRGGVADDEI VRWRADEIAE REPQLGGRFS DGIYLPTEGQ 

151 LDGRQILSAL ADALDELNVP CHWEHECAPQ DLQAQYDWVI DCRGYGAKTA 

201 WNQSPEHTST LRGIRGEVAR VYTPEITLNR PVRLLHPRYP LYIAPKENHV 

251 FVIGATQIES ESQAPASVRS GLELLSALYA VHPAFGEADI LEIAAGLRPT 

301 LNHHNPEIRY SRERRLIEIN GLFRHGFM IS PAVTAAAVRL AVALF DGKDA 

351 PERDEESGLA YIGRQD* 

ORF126ng-l and ORF126-1 show 95.1% identity in 366 aa overlap: 

10 20 30 40 50 60 

orfl2 6-l.pep MTRIAILGGGLSGRLTALQLAEQGYQIALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 

orfl26ng-l MTRIAVLGGGLSGRLTALQLAEQGYQIELFDKGTRQGEHAAAYVAAAMLAPAAEAVEATP 



Orfl2 6-l.pep E WRLGRQS I PLWRG IRCRLNTHTMNQENGS LIVWHGQDKPLSSE FVRHLKRGGVADDE I 
orf 12 6ng-l EVIRLGRQSIPLWRGIRCRLNTLTMMQENGSLIVWHGQDKPLSSE FVRHLKRGGVADDE I 



orfl2 6-l.pep VRWRADDIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECVPE 
I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I : I : 
orfl2 6ng-l VRWRADEIAEREPQLGGRFSDGIYLPTEC-QLDGRQILSALADALDELNVPCHWEHECAPQ 
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250 260 270 280 290 300 

or f 12 6-1 . pep LYIAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAIHPAFGEADILEIATGLRPT 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I : I I I I I 
orfl2 6ng-l LYIAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAVHPAFGEADILEIAAGLRPT 

250 260 270 280 290 300 

310 320 330 340 350 360 

or f 12 6-1 . pep LNHHNPEIRYNRARRLIEINGLFRHGFMISPAVTAAAARLAVALFDGKDAPERDKESGLA 
1111111111:1 I I I I I I I I I I I I I I I I ! I I I I I I I : I I I I I I I I I I I I I I I I : I I I I I 
orfl2 6ng-l LNHHNPEIRYSRERRLIEINGLFRHGFMISPAVTAAAVRLAVALFDGKDAPERDEESGLA 

310 320 330 340 350 360 



orfl2 6-l.pep YIRRQDX 
orfl26ng-l YIGRQDX 

Furthermore, ORF126ng-l shows homology to a putative Rhizobium oxidase flavoprotein: 

gi 1 2627327 (AF004408) putative amino acid oxidase flavoprotein [Rhizobium etli] 
Length = 327 
Score = 169 bits (423), Expect = 3e-41 

Identities = 112/329 (34%), Positives = 163/329 (49%), Gaps = 25/329 (7%) 

Query: 3 RIAVLGGGLSGRLTALQLAEQGYQIELFDKGTRQGEHXXXXXXXXXXXXXXXXXXXXXXX 62 

RI V G G++G A QL G+++ L ++ G 
Sbjct: 2 RI LVNGAGVAGLT VAWQL YRHG FRVT LAERAGTVGA- GASG FAGGMLAPWCERE S AEE PV 60 

Query: 63 IRLGRQSIPLWRGIRCRLNTLTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEIVR 122 

+ LGR + W + G+L+V G+D F R G DE+ 

Sbjct: 61 LTLGRLAADWWEAA LPGHVHRRGTLWAGGRDTGELDRFSRRTS-GWEWLDEVA- 113 

Query: 123 WRADEIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPQDL 182 

IA EP L GRF ++ E LD RQ L+ALA L++ + + 
Sbjct: 114 IAALEPDLAGRFRRALFFRQEAHLDPRQALAALAAGLEDARMRLTLG WGES 165 

Query: 183 QAQYDWVIDCRGYGAKTAWNQSPEHTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYPLY 242 

+D V+DC G LRG+RGE+ V T E++L+RPVRLLHPR+P+Y 

Sbjct: 166 DVDHDRVVDCTGAA QIGRLPGLRGVRGEMLCVETTEVSLSRPVRLLHPRHPIY 218 

Query: 24 3 IAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAVHPAFGEADILEIAAGLRPTLN 302 

I P++ + F++GAT IES+ P + RS +ELL+A YA+HPAFGEA + E AG+RP 
Sbjct: 219 IVPRDKNRFMVGATMIESDDGGPITARSLMELLNAAYAMHPAFGEARVTETGAGVRPAYP 278 

Query: 303 HHNPEIRYSRERRLIEINGLFRHGFMISP 331 

+ P R ++E R + +NGL+RHGF+++P 
Sbjct: 27 9 DNLP — RVTQEGRTLHVNGLYRHGFLLAP 305 

This analysis suggests that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 97 

The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ID 
819>: 

1 ATGACTGATA ATCGGGGGTT TACGCTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

101 TTGAGAAAGC AAAGATAAAT GCAGTGCGGG CAGCCTTGTT AGAAAATGCA 

151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGGTTTA AACAAACATC 

201 TACCAAGTGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

251 GTTTGAATGG AATCGtCGCG CGGG..GCTT TAGACAGTAA ATTCATGTTG 

301 AAGGCGGTAG CCATAGATAA AGATAAAAAT CCTTTTATTA TTAAGATGAA 

351 TGAAAATCTA GTAACCTTTA aTTTGCAAGA AGTCCGCCAG TTCGTGTAGT 

401 GACGGGCTGG ATTATTTTAA AGGAAATGAT AAGGACTGCA AGTTACTTAA 

451 GTAG 
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This corresponds to the amino acid sequence <SEQ ID 820; ORF127>: 

1 MTDNRGFTLV ELISWLILS VLALIVYPSY RNYVEKAKIN AVRAALLENA 
51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIVA RXALDSKFML 
101 KAVAIDKDKN PFIIKMNENL VTFICKKSAS SCSDGLDYFK GNDKDCKLLK 
151 * 

Further work revealed the following DNA sequence <SEQ ID 82 1>: 



1 ATGACTGATA ATCGGGGGTT TACGCTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

101 TTGAGAAAGC AAAGATAAAT GCAGTGCGGG CAGCCTTGTT AGAAAATGCA 

151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGGTTTA AACAAACATC 

201 TACCAAGTGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

2 51 GTTTGAATGG AATCGCGCGC GGGGCTTTAG ACAGTAAATT CATGTTGAAG 

301 GCGGTAGCCA TAGATAAAGA TAAAAATCCT TTTATTATTA AGATGAATGA 

351 AAATCTAGTA ACCTTTATTT GCAAGAAGTC CGCCAGTTCG TGTAGTGACG 

4 01 GGCTGGATTA TTTTAAAGGA AA7GATAAGG ACTGCAAGTT ACTTAAGTAG 

This corresponds to the amino acid sequence <SEQ ID 822; ORF127-l>: 



1 MTDNRGFTL V ELISWLILS VLALIV YPSY RNYVEKAKIN AVRAALLENA 
51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIAR GALDSKFMLK 
101 AVAIDKDKNP FIIKMNENLV TFICKKSASS CSDGLDYFKG NDKDCKLLK* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted QRF from N. meningitidis (strain A) 

ORF127 shows 98.0% identity over a 150aa overlap with an ORF (ORF127a) from strain A of N. 
meningitidis: 



10 20 30 40 50 60 

MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I 
MTDNRGFTLVELISVVLILSVLALIVYPSYRNYVEKAKINTVRAALLENAHFMEKFYLQN 
10 20 30 40 50 60 

70 80 90 100 110 120 

GRFKQTSTKWPSLPIKEAEGFCIRLNGIVARXALDSKFMLKAVAIDKDKNPFIIKMNENL 

GRFKQTSTKWPSLPIKEAEGFCIRLNGI-ARGALDSKFMLKAVAIDKDKNPFIIKMNENL 
70 80 90 100 110 

130 140 150 

VTFICKKSAS SCSDGLDYFKGNDKDCKLLKX 

VTFICKKSAS SCSDGLDYFKGNDKDCKLLKX 
120 130 140 150 

The complete length ORF127a nucleotide sequence <SEQ ID 823> is: 

1 ATGACTGATA ATCGGGGGTT TACGCTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

101 TTGAGAAAGC AAAGATAAAT ACAGTGCGGG CAGCCTTGTT AGAAAATGCA 

151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGATTTA AACAAACATC 

201 TACCAAATGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

251 GTTTGAATGG AATCGCGCGC GGGGCCTTAG ACAGTAAATT CATGTTGAAG 

301 GCGGTAGCCA TAGATAAAGA TAAAAATCCT TTTATTATTA AGATGAATGA 

351 AAATCTAGTA ACCTTTATTT GCAAGAAGTC CGCCAGTTCG TGTAGTGACG 

4 01 GGCTGGATTA TTTTAAAGGA AATGATAAGG ACTGCAAGTT ACTTAAGTAG 

This encodes a protein having amino acid sequence <SEQ ID 824>: 

1 MTDNRGFTL V ELISWLILS VLALIV YPSY RNYVEKAKIN TVRAALLENA 
51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIAR GALDSKFMLK 
101 AVAIDKDKNP FIIKMNENLV TFICKKSASS CSDGLDYFKG NDKDCKLLK* 



orf 127 .pep 
orfl27a 

orf 127 .pep 

orfl27a 

orf 127. pep 
orfl27a 
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ORF127a and ORF127-1 show 99.3% identity in 149 aa overlap: 

10 20 30 40 50 60 

orfl27a.pep MTDNRGFT LVELI SWLI LSVLALIVYPS YRN YVEKAKINTVRAALLENAHFMEKFYLQN 

or f 1 2 7 - 1 MT DNRG FT LVE L I SWLI LSVLALIVYPS YRN YVEKAKINAVRAALLENAHFMEKFYLQN 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 127a. pep GRFKQTSTKWPSLPIKSAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNPFIIKMNENLV 
I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 127-1 GRFKQTSTKWPSLPIKEAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNPFIIKMNENLV 
70 80 90 100 110 120 

130 140 150 

orf 127a. pep TFICKKSASSCSDGLDYFKGNDKDCKLLKX 

I I I I I I I I I I I I I I I I I I I I 

orf 127-1 TFICKKSASSCSDGLDYFKGNDKDCKLLKX 
130 140 150 

Homology with a predicted ORF from ~N. gonorrhoeae 

ORF127 shows 97.3% identity over a 150 aa overlap with a predicted ORF (ORF127ng) from 
N. gonorrhoeae: 

orf 127 .pep MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 60 
orfl27ng MTDNRGFTLVELISVVLILSVLALIVYPSYRNYVEKAKINAVRAAFLENAHFMEKFYLQN 60 

orf 127 .pep GRFKQTSTKWPSLPIKEAEGFCIRLNGIVARXALDSKFMLKAVAIDKDKNPFIIKMNENL 120 

I I I I I I ! I I I II I I I I I I I I I I I I I I I I II I I I I I 

orfl27ng GRFKQTSTKWPSLPIKEAEGFCIRLNGI-ARGALDSKFMLKAVAIDKDKNPFIIKMNENL 119 

orf 127. pep VTFICKKSASSCSDGLDYFKGNDKDCKLLK 150 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl27ng VTFICKKSASSCSDRLDYFKGNDKDCKLLK 14 9 

The complete length ORF127ng nucleotide sequence <SEQ ID 825> is: 

1 ATGACTGATA ATCGGGGGTT TACACTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGC'IT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

101 TTGAGAAAGC AAAGATAAAT GCAGTGCGGG CAGCCTTGTT AGAAAATGCA 

151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGATTTA AACAAACATC 

201 TACCAAATGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

251 GTTTGAATGG AATCGCGCGC GGGGCTTTAG ACAGTAAATT CATGTTGAAG 

301 GCGGTAGCCA TAGATAAAGA TAAAAATCCT TTTATTATTA AGATGAATGA 

351 AAATCTAGTA ACCTTTATTT GCAAGAAGTC CGCCAGTTCG TGTAGTGACG 

401 GGCTGGATTA TTTTAAAGGA AATGATAAGG ACTGCAAGTT ACTTAAGTAG 

This encodes a protein having amino acid sequence <SEQ ID 826>: 

1 MTDNRGFTL V ELISVVLILS VLALIVY PSY RNYVEKAKIN AVRAAFLENA 
51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAE3 FCIRLNGIAR GALDSKFMLK 
101 AVAIDKDKNP FIIKMNENLV TFICKKSASS CSDRLDYFKG NDKDCKLLK* 

ORF127ng and ORF127-1 show 100.0% identity in 149 aa overlap: 

10 20 30 40 50 60 

orf 127-1 . pep MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 

orfl27ng-l MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf 127-1. pep GRFKQTSTKWPSLPIKEAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNPFIIKMNENLV 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 

orfl27ng-l GRFKQTSTKWPSLPIKEAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNPFIIKMNENLV 

70 80 90 100 110 120 
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130 140 150 



orf 127-1 .pep 



:fl27ng-l 



TFICKKSASSCSDGLDYFKGNDKDCKLLKX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
TFICKKSASSCSDGLDYFKGNDKDCKLLKX 



130 140 150 



This analysis, including the fact that the predicted transmembrane domain is shared by the 
meningococcal and gonococcal proteins, suggests that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

Example 98 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 827> 



1 ..GTGTCGCTGG CTTCGGTGAT TGCCTCTCAA ATCTTCCTTT ACGAAGATTT 

51 CAACCAAATG CGGAAAACCC GTGGAGCTAT CTGCGGTTTT CTTGTCCAAT 

101 ATTTATCTGG GGTTTCAGCA GGGGTATTTC GATTTGAGTG CCGACGAGAA 

151 CCCCGTACTG CATATCTGGT CTTTGGCAGT AGAGGAACAG TATTACCTCC 

201 TGTATCCCCT TTTGCTGATA TTTTGCTGCA AAAAAACCAA ATCGCTACGG 

251 GTGCTGCGTA ACATCAGCAT CATCCTGTTT TTGATTTTGA CTGCCTCATC 

301 GTTTTTGCCA AGCGGGTTTT ATACCGACAT CCTCAACCAA CCCAATACTT 

351 ATTACCTTTC GACACTGAGG TTTCCCGAGC TGTTGGCAGG TTCGCTGCTG 

401 GCGGTTTACG GGCAAACGCA AAACGGCAGA CGGCAAACAG CAAATGGAAA 

4 51 ACGGCAGTTG CTTTCATCAC TCTGCTTCGG CGCATTGCTT GCCTGCCTGT 

501 TCGTGATTGA CAAACACAAT CCGTTTATCC CGGGAATGAC CCTGCTCCTT 

551 CCCTGCCTGC TGACGGCACT GCTTATCCGG AGTATGCAAT ACGGGACACT 

601 TCCGACCCGC ATCCTGTCGG CAAGCCCCAT CGTATTTGTC GGCAAAATCT 

651 CTTATTCCCT ATACCTGTAC CATTGGATTT TTATTGCTTT CGCTCCGCTC 

701 ATTAGAGGCG GGAAACAGCT CGGACTGCCT GCCG. . 



This corresponds to the amino acid sequence <SEQ ID 828; ORF128>: 



1 ..VSLASVIASQ IFLYEDFNQM RKTVELSAVF LSNIYLGFQQ GYFDLSADEN 

51 PVLHIWSLAV EEQYYLLYPL LLIFCCKKTK SLRVLRNISI ILFLILTASS 

101 FLPSGFYTDI LNQPNTYYLS TLRFPELLAG SLLAVYGQTQ NGRRQTANGK 

151 RQLLSSLCFG ALLACLFVID KHNPFIPGMT LLLPCLLTAL LIRSMQYGTL 

201 PTRILSASPI VFVGKISYSL YLYHWIFIAF APLIRGGKQL GLPA. . 



Further work revealed the complete nucleotide sequence <SEQ ID 82 9>: 

1 ATGCAAGCTG TCCGATACAG ACCGGAAATT GACGGATTGC GGGCCGTCGC 

51 CGTGCTATCC GTCATGATTT TCCACCTGAA TAACCGCTGG CTGCCCGGAG 

101 GATTCCTGGG GGTGGACATT TTCTTTGTCA TCTCAGGATT CCTCATTACC 

151 GGCATCATTC TTTCTGAAAT ACAGAACGGT TCTTTTTCTT TCCGGGATTT 

201 TTATACCCGC AGGATTAAGC GGATTTATCC TGCCTTTATT GCGGCCGTGT 

251 CGCTGGCTTC GGTGATTGCC TCTCAAATCT TCCTTTACGA AGATTTCAAC 

301 CAAATGCGGA AAACCGTGGA GCTTTCTGCG GTTTTCTTGT CCAATATTTA 

351 TCTGGGGTTT CAGCAGGGGT ATTTCGATTT GAGTGCCGAC GAGAACCCCG 

4 01 TACTGCATAT CTGGTCTTTG GCAGTAGAGG AACAGTATTA CCTCCTGTAT 

4 51 CCCCTTTTGC TGATATTTTG CTGCAAAAAA ACCAAATCGC TACGGGTGCT 

501 GCGTAACATC AGCATCATCC TGTTTTTGAT TTTGACTGCC TCATCGTTTT 

551 TGCCAAGCGG GTTTTATACC GACATCCTCA ACCAACCCAA TACTTATTAC 

601 CTTTCGACAC TGAGGTTTCC CGAGCTGTTG GCAGGTTCGC TGCTGGCGGT 

651 TTACGGGCAA ACGCAAAACG GCAGACGGCA AACAGCAAAT GGAAAACGGC 

7 01 AGTTGCTTTC ATCACTCTGC TTCGGCGCAT TGCTTGCCTG CCTGTTCGTG 

7 51 ATTGACAAAC ACAATCCGTT TATCCCGGGA ATGACCCTGC TCCTTCCCTG 

801 CCTGCTGACG GCACTGCTTA TCCGGAGTAT GCAATACGGG ACACTTCCGA 

851 CCCGCATCCT GTCGGCAAGC CCCATCGTAT TTGTCGGCAA AATCTCTTAT 

901 TCCCTATACC TGTACCATTG GATTTTTATT GCTTTCGCCC ATTACATTAC 

951 AGGCGACAAA CAGCTCGGAC TGCCTGCCGT ATCGGCGGTT GCCGCGTTGA 

1001 CGGCCGGATT TTCCCTGTTG AGTTATTATT TGATTGAACA GCCGCTTAGA 

1051 AAACGGAAGA TGACCTTCAA AAAGGCATTT TTCTGCCTCT ATCTCGCCCC 

1101 GTCCCTGATA CTTGTCGGTT ACAACCTGTA CGCAAGGGGG ATATTGAAAC 

1151 AGGAACACCT CCGCCCGTTG CCCGGCGCGC CCCTTGCTGC GGAAAATCAT 
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1201 TTTCCGGAAA CCGTCCTGAC CCTCGGCGAC TCGCACGCCG GACACCTGAG 

1251 GGGGTTTCTG GATTATGTCG GCAGCCGGGA AGGGTGGAAA GCCAAAATCC 

1301 TGTCCCTCGA TTCGGAGTGT TTGGTTTGGG TAGATGAGAA GCTGGCAGAC 

1351 AACCCGTTAT GTCGAAAATA CCGGGATGAA GTTGAAAAAG CCGAAGCCGT 

5 1401 TTTCATTGCC CAATTCTATG ATTTGAGGAT GGGCGGCCAG CCTGTGCCGA 

14 51 GATTTGAAGC GCAATCCTTC CTAATACCCG GGTTCCCAGC CCGATTCAGG 

1501 GAAACCGTCA AAAGGATAGC CGCCGTCAAA CCCGTCTATG TTTTTGCAAA 

1551 CAACACATCA ATCAGCCGTT CGCCCCTGAG GGAGGAAAAA TTGAAAAGAT 

1601 TTGCCGCAAA CCAATATCTC CGCCCCATTC AGGCTATGGG CGACATCGGC 

10 1651 AAGAGCAATC AGGCGGTCTT TGATTTGATT AAAGATATTC CCAATGTGCA 

17 01 TTGGGTGGAC G C AC AAAAAT ACCTGCCCAA AAACACGGTC GAAATATACG 

1751 GCCGCTATCT TTACGGCGAC CAAGACCACC TGACCTATTT CGGTTCTTAT 

1801 TATATGGGGC GGGAATTCCA CAAACACGAA CGCCTGCTTA AATCTTCCCA 

1851 CGGCGGCGCA TTGCAGTAG 

15 This corresponds to the amino acid sequence <SEQ ID 830; ORF128-l>: 

1 MQAVRYRPE I DGLRAVAVLS VMIFHL NNRW LPGGFLG VDI FFVISGFLIT 

51 GIIL SEIQNG 5FSFRDFYTR RIKRIYP AFI AAV r 5LASVIA SQIFL YEDFN 

101 QMRKTVELSA VFLSNIYLGF QQGYFDLSAD ENPVLHIWSL AVEEQYYLLY 

151 PLLLIFCCKK TKSLRVLRN I SIILFLILTA SSFLPS GFYT DILNQPNTYY 

20 201 LSTLRFPELL AGSLLAVYGQ TQNGRRQTAN GKRQ LLSSLC FGALLACLFV 

251 IDKHNPF IPG MTLLLPCLLT ALLI RSMQYG TLPTRILSAS PIVFVGKISY 

301 SLYLYHWIFI AFAHYITGDK QL GLPAVSAV AALTAGFSLL SYYLIEQPLR 

351 KRKMTFKKAF FCLYLAPSLI LVGYNLYARG ILKQEHLRPL PGAPLAAENH 

401 FPETVLTLGD SHAGHLRGFL DYVGSREGWK AKILSLDSEC LVWVDEKLAD 

25 4 51 NPLCRKYRDE VEKAEAVFIA QFYDLRMGGQ PVPRFEAQSF LIPGFPARFR 

501 ETVKRIAAVK PVYVFANKTS ISRSPLREEK LKRFAANQYL RPIQAMGDIG 

551 KSNQAVFDLI KDIPNVHWVD AQKYLPKNTV EIYGRYLYGD QDHLTYFGSY 

601 YMGRE FHKHE RLLKSSHGGA LQ* 

Computer analysis of this amino acid sequence gave the following results: 

30 Homology with hypothetical integral membrane protein H10392 of H. influenzae (accession number U32723) 
ORF128 and HI0392 show 52% aa identity in 180aa overlap: 

Orfl28: 1 VSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGFQQGYFDLSADENPVLHIWSLAV 60 
++L S IAS IF+Y DFN++RKT+EL+ FLSN YLG QGYFDLSA+ENPVLHIWSLAV 
^ HI0392: 4 6 MALVSFIASAIFIYNDFNKLRKTIELAIAFLSNFYLGLTQGYFDLSANENPVLHIWSLAV 105 

Orfl28: 61 EEQXXXXXXXXXIFCCKKTKSLRVLRNISIILFLILTASSFLPSGFYTDILNQPNTYYLS 120 

E Q I KK + ++VL I++ILF IL A+SF+ + FY ++L+QPN YYLS 

HI0392: 106 EGQYYLIFPLILILAYKKFREVKVLFIITLILFFILLATSFVSANFYKEVLHQPNIYYLS 165 

40 Orfl28: 121 TLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLCFGALLACLFVIDKHNPFIPGMT 180 

LRFPELL GSLLA+Y N + Q + +L+ L L +CLF+++ + FIPG+T 

HI0392: 166 NLRFPELLVGSLLAIYHNLSN-KVQLSKQVNNILAILSTLLLFSCLFLMNNNIAFIPGIT 224 

Homology with a predicted ORF from N. meningitidis (strain A) 
45 ORF128 shows 98.0% identity over a 244aa overlap with an ORF (ORF128a) from strain A of N. 
meningitidis: 

10 20 30 

orf 12 8 .pep VSLASVIASQIFLYEDFNQMRKTVELSAVF 

I I I I I I I I I I I I I I I 

50 orf 128a ILSEIQNGSFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVF 

60 70 80 90 100 110 

40 50 60 70 80 90 

orf 128. pep LSNIYLGFQQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISI 

55 i ! i 1 1 1 1 i 1 1 1 1 1 1 i 1 1 1 i i 1 1 1 i i t 1 1 i i i I I l l 1 1 1 1 1 1 l 1 1 1 1 I 

orf 128a LSNIYLGFQQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISI 
120 130 140 150 160 170 

100 110 120 130 140 150 

60 orf 128 .pep ILFLILTASSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGK 
I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
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RQLLSSLCFGALLACLFVIDKHNPFI P3MTLLLPCLLTALLIRSMQYGTLPTRILSASPI 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
RQLLSSLCFGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPI 
240 250 260 270 280 290 



220 230 240 

orf 128 .pep VFVGKISYSLYLYHWIFIAFAPLIRGGKQLGLPA 

I I I I I I ! I I I II I I I I I I I I I 

orf 128a VFVGKISYSLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKR 
300 310 320 330 340 350 



orf 128a KMTFKKAFFCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSH 
360 370 380 390 400 410 

The complete length ORF128a nucleotide sequence <SEQ ID 83 1> is: 

1 ATGCAAGCTG TCCGATACAG ACCGGAAATT GACGGATTGC GGGCCGTCGC 

51 CGTGCTATCC GTCATGATTT TCCACCTGAA TAACCGCTGG CTGCCCGGAG 

101 GATTCCTGGG GGTGGACATT TTCTTTGTCA TCTCAGGATT CCTCATTACC 

151 GGCATCATTC TTTCTGAAAT ACAGAACGGT TCTTTTTCTT TCCGGGATTT 

201 TTATACCCGC AGGATTAAGC GGATTTATCC TGCTTTTATT GCGGCCGTGT 

251 CGCTGGCTTC GGTGATTGCC TCTCAAATCT TCCTTTACGA AGATTTCAAC 

301 CAAATGCGGA AAACCGTGGA GCTTTCTGCG GTTTTCTTGT CCAATATTTA 

351 TCTGGGGTTT CAGCAGGGGT ATTTCGATTT GAGTGCCGAC GAGAACCCCG 

401 TACTGCATAT CTGGTCTTTG GCAGTAGAGG AACAGTATTA CCTCCTGTAT 

4 51 CCTCTTTTGC TGATATTTTG CTGCAAAAAA ACAAAATCGC TACGGGTGCT 

501 GCGTAACATC AGCATCATCC TATTTCTGAT TTTGACTGCC ACATCGTTTT 

551 TGCCAAGCGG GTTTTATACC GATATTCTCA ACCAACCCAA TACTTATTAC 

601 CTTTCGACAC TGAGGTTTCC CGAGCTGTTG GCAGGTTCGC TGCTGGCGGT 

651 TTACGGGCAA ACGCAAAACG GCAGACGGCA AACAGCAAAT GGAAAACGGC 

701 AGTTGCTTTC ATCACTCTGC TTCGGCGCAT TGCTTGCCTG CCTGTTCGTG 

751 ATTGACAAAC ACAATCCGTT TATCCCGGGA ATGACCCTGC TCCTTCCCTG 

801 CCTGCTGACG GCACTGCTTA TCCGGAGTAT GCAATACGGG ACACTTCCGA 

851 CCCGCATCCT GTCGGCAAGC CCCATCGTAT TTGTCGGCAA AATCTCTTAT 

901 TCCCTATACC TGTACCATTG GATTTTTATT GCTTTCGCCC ATTACATTAC 

951 AGGCGACAAA CAGCTCGGAC TGCCTGCCGT ATCGGCGGTT GCCGCGTTGA 

1001 CGGCCGGATT TTCCCTGTTG AGTTATTATT TGATTGAACA GCCGCTTAGA 

1051 AAACGGAAGA TGACCTTCAA AAAGGCATTT TTCTGCCTCT ATCTCGCCCC 

1101 GTCCCTGATA CTTGTCGGTT ACAACCTGTA CGCAAGGGGG ATATTGAAAC 

1151 AGGAACACCT CCGCCCGTTG CCCGGCGCGC CCCTTGCTGC GGAAAAT CAT 

1201 TTTCCGGAAA CCGTCCTGAC CCTCGGCGAC TCGCACGCCG GACACCTGCG 

1251 GGGGTTTCTG GATTATGTCG GCAGCCGGGA AGGGTGGAAA GCCAAAATCC 

1301 TGTCCCTCGA TTCGGAGTGT TTGGTTTGGG TAGATGAGAA GCTGGCAGAC 

1351 AACCCGTTAT GTCGAAAATA CCGGGATGAA GTTGAAAAAG CCGAAGCCGT 

1401 TTTCATTGCC CAATTCTATG ATTTGAGGAT GGGCGGCCAG CCCGTGCCGA 

1451 GATTTGAAGC GCAATCCTTC CTAATACCCG GGTTCCCAGC CCGATTCAGG 

1501 GAAACCGTCA AAAGGATAGC CGCCGTCAAA CCCGTCTATG TTTTTGCAAA 

1551 CAACACATCA ATCAGCCGTT CGCCCCTGAG GGAGGAAAAA TTGAAAAGAT 

1601 TTGCCGCAAA CCAATATCTC CGCCCCATTC AGGCTATGGG CGACATCGGC 

1651 AAGAGCAATC AGGCGGTCTT TGATTTGATT AAAGATATTC CCAATGTGCA 

17 01 TTGGGTGGAC GCACAAAAAT ACCTGCCCAA AAACACGGTC GAAATATACG 

1751 GCCGCTATCT TTACGGCGAC CAAGACCACC TGACCTATTT CGGTTCTTAT 

1801 TATATGGGGC GGGAATTTCA CAAACACGAA CGCCTGCTTA AATCTTCTCG 

1851 CGACGGCGCA TTGCAGTAG 

This encodes a protein having amino acid sequence <SEQ ID 832>: 



1 MQAVRYRPE I DGLRAVAVLS VKIFHL NNRW LPGGFLG VDI FFVISGFLIT 

51 GIIL SEIQNG SFSFRDFYTR RIKRIYPA FI AAVSLASVIA SQIFL YEDFN 

101 QMRKTVELSA VFLSNIYLGF QQGYFDLSAD ENPVLHIWSL AVEEQYYLLY 

151 PLLLIFCCKK TKSLRVLRN I SIILFLILTA TSFLPS GFYT DILNQPNTYY 

2 01 LSTLRFPELL AGSLLAVYGQ TQNGRRQTAN GKRQ LLSSLC FGALLACLFV 

251 IDKHNPF IPG MTLLLPCLLT ALLI RSMQYG TLPTRILSAS PIVFVGKISY 

301 SLYLYHWIFI AFAHYITGDK QLG LPAVSAV AALTAGFSLL SYYLIEQPLR 

351 KRKMTFKKAF FCLYLAPSLI LVGYNLYARG ILKQEHLRPL PGAPLAAENH 

401 FPETVLTLGD SHAGHLRGFL DYVGSREGKK AKILSLDSEC LVWVDEKLAD 

4 51 NPLCRKYRDE VEKAEAVFIA QFYDLRMGGQ PVPRFEAQSF LIPGFPARFR 
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501 ETVKRIAAVK PVYVFANNTS ISRSPLREEK LKRFAANQYL RPIQAMGDIG 
551 KSNQAVFDLI KDIPNVHWVD AQKYLPKNTV EIYGRYLYGD QDHLTYFGSY 
601 YMGRE FHKHE RLLKSSRDGA LQ* 

ORF128a and ORF128-1 show 99.5% identity in 622 aa overlap: 

orfl28a.pep 



orfl28-l 
orfl28a.pep 
orfl28-l 
orfl28a.pep 
orfl28-l 
orfl28a.pep 
orfl28-l 
orfl28a.pep 
orfl28-l 
orfl28a.pep 
orfl28-l 
orfl28a.pep 
orfl28-l 
orfl28a.pep 
orfl28-l 
orfl28a.pep 
orfl28-l 
orf 128a .pep 
orfl28-l 
orfl28a.pep 



MQAVRYRPEIDGLRAVAVLSVMIFHLNNRWLPGGFLGVDIFFVISGFLITGI ILSEIQNG 
I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
MQAVRYRPEIDGLRAVAVLSVMIFHLNNRWLPGGFLGVDIFFVISGFLITGI ILSEIQNG 

SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGF 

SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGF 

QQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISIILFLILTA 

QQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISIILFLILTA 

TSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLC 

SSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLC 

FGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 

I II I I I I I I I I I I I M I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

FGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 

SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKRKMTFKKAF 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I 
SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKRKMTFKKAF 

FCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSHAGHLRGFL 

FCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSHAGHLRGFL 

DYVGSREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 

DYVGSREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 

PVPRFEAQSFLIPGFPARFRETVKRIAAVKPVYVFANNTSISRSPLREEKLKRFAANQYL 

PVPRFEAQSFLIPGFPARFRETVKRIAAVKPVYVFANNTSISRSPLREEKLKRFAANQYL 

RPIQAMGDIGKSNQAVFDLIKDIPNVHWVDAQKYLPKNTVEIYGRYLYGDQDHLTYFGSY 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 

RPIQAMGDIGKSNQAVFDLIKDIPNVHWVDAQKYLPKNTVEIYGRYLYGDQDHLTYFGSY 



YMGRE FHKHERLLKS SRDGALQX 
I I I I I I I I I I I I I I I I : I I I I I 
orfl28-l YMGRE FHKHERLLKS SHGGALQX 

Homology with a predicted ORF from N.sonorrhoeae 

ORF128 shows 93.4% identity over 244 aa overlap with a predicted ORF (ORF128ng) from N. 
gonorrhoeae: 

VSLASVIASQIFLYEDFNQMRKTVELSAVF 30 

I I I I Illllllhllhll 

ILSEIQNGSFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTIELSTVF 112 



orf 128. pep 
orfl28ng 
orf 128. pep 
orfl28ng 
orf 128 .pep 
orfl28ng 
orf 128. pep 
orfl28ng 



LSNIYLGFQQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISI 90 

I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

LSNIYLGFRLGYFDLSADEN PVLHIWSLAVEEQYYLLYPLLLIFCYKKTKS LRVLRNISI 172 

ILFLILTASSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGK 150 

ILFLILTASSFLPAGFYTDILNQPNTYYLSTLRFPELLVGSLLAVYGQTQNGRRQTENGK 232 

RQLLSSLCFGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPI 210 

HIM: :|||||:|||| I I 

RQLLSLLCFGALLVCLFVIDKHDPFIPGITLLLPCLLTALLIRSMQYGTLPTRILSASPI 2 92 
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orf 128 . pep VFVGKISYSLYLYHWIFIAFAP1IRGGKQLGLPA 

orfl28ng VFVGKISYSLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKR 

The complete length ORF128ng nucleotide sequence <SEQ ID 833> is: 



51 



1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 



ATGCAAGCTG 
CGTGCTATCC 
GATTCCTGGG 
AACATCATTC 
TTATACCCGC 
CCCTGGCTTC 
CAAATGAGGA 
TTTGGGGTTC 
TACTGCATAT 
CCTCTTTTGC 
GCGTAATATC 
TGCCGGCCGG 
CTTTCGACAC 
TTACGGGCAA 
AGTTGCTTTC 
ATCGACAAAC 
CCTGCTGACG 
CCCGCATCCT 
TCCCTATACC 
AGGCGACAAA 
CGGCCGGATT 
AAACGGAAGA 
GTCCCTGATG 
AGGAACACCT 
TTTCCGGAAA 
GGGGTTTCTG 
TGTCCCTCGA 
AACCCGTTGT 
TTTCATTGCC 
GATTTGAAGC 
GAAACCGTCA 
CAATACATCA 
TTGCTATAAA 
AAGAGCAATC 
TTGGGTGGAC 
GACGCTATCT 
TATATGGGGC 
AGGCGGCGCA 



TCCGATACAG 
GTCATTATTT 
GGTGGACATT 
TTTCTGAAAT 
AGGATTAAGC 
GGTGATTGCT 
AAACCATAGA 
CGATTGGGGT 
CTGGTCTTTG 
TGATATTCTG 
AGCATCATCC 
GTTTTATACC 
TGAGGTTTCC 
ACGCAAAACG 
ATTACTCTGT 
ACGATCCGTT 
GCGCTGCTTA 
GTCGGCAAGC 
TGTACCATTG 
CAGCTCGGAC 
TTCCCTGTTG 
TGACCTTCAA 
CTTGTCGGTT 
CCGCCCGCTG 
CCGTCTTGAC 
GATTATGTCG 
TTCGGAGTGT 
GCCGAAAATA 
CAATTCTATG 
GCAATCCTTC 
AGAGGATAGC 
ATCAGCCGTT 
CCAATACCTC 
AGGCGGTCTT 
GCACAAAAAT 
TTACGGCGAC 
GGGAATTTCA 
TTGCAGTAG 



GCCTGAAATT 
TCCACCTGAA 
TTCTTTGTCA 
ACAGAACGGT 
GGATTTATCC 
TCTCAAATCT 
GCTTTCTACG 
ATTTCGATTT 
GCGGTAGAGG 
TTACAAAAAA 
TGTTTCTGAT 
GACATCCTCA 
CGAGCTGTTG 
GCAGACGGCA 
TTCGGCGCat 
TATCCCGGGA 
TCCGGAGTAT 
CCCATCGTAT 
GATTTTTATT 
TGCCTGCCGT 
AGCTATTATT 
AAAGGCATTT 
ACAACCTGTA 
CCCGGCACGC 
CCTCGGCGAC 
GCGGCAGGGA 
TTGGTTTGGG 
CCGGGATGAA 
ATTTGAGGAT 
CTGATACCCG 
CGCCGTCAAA 
CTCCCTTGAG 
CGGCCTATTC 
TGATTTGGTT 
ACCTGCCCAA 
CAAGACCACC 
CAAACACGAA 



GACGGATTGC 
TAACCGCTGG 
TCTCGGGATT 
TCTTTTTCTT 
TGCTTTTATT 
TCCTTTACGA 
GTTTTTTTGT 
GAGTGCCGAC 
AACAGTATTA 
ACCAAATCAC 
TTTGACCGCA 
ACCAACCcaa 
GTGGGTTCGC 
AACAGAAAAT 
tgCTTGTCTG 
ATAACCCTGC 
GCAATACGGG 
TTGTCGGCAA 
GCCTTCGCCC 
ATCGGCGGTT 
TGATTGAACA 
TTCTGCCTTT 
TTCAAGAGGG 
CCGTTGCTGC 
TCGCACGCCG 
AGGGTGGAAA 
TGGATGAGAA 
GT T GAAAAAG 
GGGCGGCCAG 
GGTTCAAAGC 
CCTGTATATG 
GGAGGAAAAA 
GGGCTATGGG 
AAAGATATTC 
AAACACGGTC 
TGACCTATTT 
CGCCTGCTCA 



GGGCCGTCGC 
CTGCCCGGAG 
CCTCATTACC 
TCCGGGATTT 
GCGGCCGTGT 
AGATTTCAAC 
CCAATATTTA 
GAGAACCCCG 
CCTCCTGTAT 
TACGGGTGCT 
TCATCGTTTT 
TACTTATTAC 
TGTTGGCGGT 
GGAAAACGGC 
CCTGTTCGTG 
TCCTTCCCTG 
ACACTTCCGA 
AATCTCTTAT 
AT T ACATT AC 
GCCGCGTTGA 
GCCGCTTAGA 
ATCTCGCCCC 
ATATTGAAAC 
GGAAAATAAT 
GACACCTGCG 
GCTAAAATCC 
GCTGGCAGAC 
CCGAAGCTGT 
CCCGTGCCGA 
CCGATTCAGG 
TTTTTGCAAA 
TTGAAAAGAT 
CGACATCGGC 
CCAATGTGCA 
GAAATACACG 
CGGTTCTTAT 
AGCATTCCCG 



This encodes a protein having amino acid sequence <SEQ ID 834>: 



1 MQAVRYRPE I DGLRAVAVLS VIIFHL NNRW LPGGFLG VDI FFVISGFLIT 

51 NIIL SEIQNG SFSFRDFYTR RIKRIYPA FI AAVSLASVIA SQIFL YEDFH 

101 QMRKTIELST VFLSNIYLC-F RLGYFDLSAD ENPVLHIWSL AVEEQYYLLY 

151 PLLLIFCYKK TKSLRVLRN I SIILFLILTA SSFLPA GFYT DILNQPNTYY 

201 LSTLRFPELL VGSLLAVYGQ TQNGRRQTEN GKRQ LLSLLC FGALLVCLFV 

251 IDKHDPF IPG ITLLLPCLLT ALLI RSMQYG TLPTRILSAS PIVFVGKISY 

301 SLYLYHWIFI AFAHYITGDK QLG LPAVSAV AALTAGFSLL SYYLIEQPLR 

351 KRKMTFKKAF FCLYLAPSLM LVGYNLYSRG ILKQEHLRPL PGT PVAAENN 

401 FPETVLTLGD SHAGHLRGFL DYVGGREGWK AKILSLDSEC LVWVDEKLAD 

4 51 NPLCRKYRDE VEKAEAVFIA QFYDLRMGGQ PVPRFEAQSF LIPGFKARFR 

501 ETVKRIAAVK PVYVFANNTS ISR5PLREEK LKRFAINQYL RPIRAMGDIG 

551 KSNQAVFDLV KDIPNVHWVD AQKYLPKNTV EIHGRYLYGD QDHLTYFGSY 

601 YMGREFHKHE RLLKHSRGGA LQ* 

ORF128ng and ORF128-1 show 95.7% identity in 622 aa overlap: 

orf 128-1. pep MQAVRYRPEIDGLRAVAVLSVMIFHLNNRWLPGGFLGVDIFFVISGFLITGIILSEIQNG 

0rfl2 8ng MQAVRYRPEIDGLRAVAVLSVIIFHLNNRWLPGGFLGVDIFFVISGFLITNIILSEIQNG 



orf 128-1. pep SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGF 
orfl2 8ng SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTIELSTVFLSNIYLGF 
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orfl28-l.pep 
orfl28ng 
orf 128-1 .pep 
orfl28ng 
orf 128-1. pep 
orfl28ng 
orf 128-1. pep 
orfl28ng 
orfl28-l.pep 
orfl28ng 
orfl28-l.pep 
orf 128ng 
orf 128-1. pep 
orf 128ng 
orf 128-1 .pep 
orf 128ng 
orf 128-1. pep 
orfl28ng 



QQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISIILFLILTA 

RLG YFDLSADEN PVLHIWSLAVEEQYYLLYPLLL I FCYKKTKS LRVLRN I S I I LFLI LTA 

SSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLC 

SSFLPAGFYTDILNQPNTYYLSTLRFPELLVGSLLAVYGQTQNGRRQTENGKRQLLSLLC 

FGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 

FGALLVCLFVIDKHDPFIPGITLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 

SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKRKMTFKKAF 

SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKRKMTFKKAF 

FCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSHAGHLRGFL 

FCLYLAP3LMLVGYNLYSRGILKQEHLRPLPGT PVAAENNFPETVLTLGDSHAGHLRGFL 

DYVGSREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 

DYVGGREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 

PVPRFEAQS FLI PGFPARFRETVKRI AAVKPVYVFANNTS 1 3RS PLREEKLKRFAANQYL 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
PVPRFEAQS FLI PGFKARFRETVKRIAAVKPVYVFANNTS I SRS PLREEKLKRFAINQYL 

RPIQAMGDIGKSNQAVFDLIKDI PNVHWVDAQKYLPKNTVEIYGRYLYGDQDHLTYFGSY 
I I I : I I I I I I I I I I I I I I I : I I I I I I I I I I I I I II I I I I I I I : I I I I I I I I I I I I I I I I I 
RPIRAMGDIGKSNQAVFDLVKDIPNVHWVDAQKYLPKNTVEIHGRYLYGDQDHLTYFGSY 

YMGRE FHKHERLLKSSHGGALQX 



In addition, ORF218ng shows homology to a hypothetical H.influenzae protein: 

sp|P43993|Y392_HAEIN HYPOTHETICAL PROTEIN HI0392 >gi | 1074385 I pir | | B64007 
hypothetical protein HI0392 - Haemophilus influenzae (strain Rd KW20) 
>gi 1 1573364 (U32723) H. influenzae predicted coding region HI0392 [Haemophilus 
influenzae] Length = 245 
Score - 239 bits (604), Expect - 3e-62 

Identities = 124/225 (55%), Positives = 152/225 (67%), Gaps = 1/225 (0%) 

VDIFFVISGFLITNIILSEIQNGSFSFRDFYTRRIKRIYPXXXXXXXXXXXXXXXXFLYE 97 
+DIFFVISGFLIT II++EIQ SFS + FYTRRIKRIYP F+Y 
MDIFFVISGFLITGIIITEIQQNSFSLKQFYTRRIKRIYPAFITVMALVSFIASAIFIYN 60 

DFNQMRKTIELSTVFLSNIYLGFRLGYFDLSADENPVLHIWSLAVEEQXXXXXXXXXIFC 157 
DFN++RKTIEL+ FLSN YLG GYFDLSA+ENPVLHIWSLAVE Q I 
DFNKLRKTIELAIAFLSNFYLGLTQGYFDLSANENPVLHIWSLAVEGQYYLIFPLILILA 120 

YKKTKSLRVLRNISIILFLILTASSFLPAGFYTDILNQPNTYYLSTLRFPELLVGSLLAV 217 
YKK + ++VL I++ILF IL A+SF+ A FY ++L+QPN YYLS LRFPELLVGSLLA+ 



This analysis, including the identification of several putative transmembrane domains, suggests that 
these proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens 
for vaccines or diagnostics, or for raising antibodies. 



Query: 


38 


Sbjct: 


1 




98 


Sbjct: 


61 


Query: 


158 


Sbjct: 


121 




218 


Sbjct: 


181 
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Example 99 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 835>: 

1 . . ATTATTTACG AATACCGCTG GATGTTTCTT TACGGCGCAC TGACGACCTT 

51 GGGGCTGACG GTCGTGGCAA C.GCGGGCGG TTCGGTATTG GGTCTGTTGT 

101 TGGCGTTGGC GCGCCTGATT CACTTGGAAA AAGCCGGTGC GCCGATGCGC 

151 GTGCTGGCGT GGGCGTTGCG TAAAGTTTCG CTGCTGTATG TTACGCTGTT 

201 CCGGGGTACG CCGCTGTTTG TGCAGATTGT GATTTGGGCG TATGTGTGGT 

251 TTCCGTTTTT CGTC. . 

This corresponds to the amino acid sequence <SEQ ID 836; ORF129>: 



Further work revealed the complete nucleotide sequence <SEQ ID 837>: 

1 ATGGATTTTC GTTTTGACAT TATTTACGAA TACCGCTGGA TGTTTCTTTA 

51 CGGCGCACTG ACGACCTTGG GGCTGACGGT CGTGGCAACG GCGGGCGGTT 

101 CGGTATTGGG TCTGTTGTTG GCGTTGGCGC GCCTGATTCA CTTGGAAAAA 

151 GCCGGTGCGC CGATGCGCGT GCTGGCGTGG GCGTTGCGTA AAGTTTCGCT 

2 01 GCTGTATGTT ACGCTGTTCC GGGGTACGCC GCTGTTTGTG CAGATTGTGA 

2 51 TTTGGGCGTA TGTGTGGTTT CCGTTTTTCG TCCATCCTTC AGACGGCATT 
301 TTGGTCAGCG GCGAGGCGGC AATCGCGCTG CGTCGCGGAT ACGGGCCGCT 

3 51 GATTGCCGGT TCTTTGGCAC TGATCGCCAA CTCGGGGGCG TATATCTGTG 

4 01 AGATTTTCCG CGCGGGCATC CAGTCTATAG ACAAAGGACA GATGGAGGCG 
451 GCGCGTTCTT TGGGGCTGAC CTATCCGCAG GCGATGCGCT ATGTGATTCT 
501 GCCGCAGGCA TTGCGCCGCA TGCTGCCGCC TTTGGCGAGC GAGTTCATCA 
551 CGCTCTTGAA AGACAGCTCG CTGCTGTCGG TCATTGCTGT GGCGGAGTTG 
601 GCGTATGTTC AGAATACGAT TACGGGCCGG TATTCGGTTT AT GAAGAAC C 
651 GCTTTACACC GTCGCCCTGA TTTATCTGTT GATGACGACT TTCTTAGGCT 
7 01 GGATATTCCT GCGTTTGGAA AAACGTTACA ATCCGCAACA CCGCTGA 

This corresponds to the amino acid sequence <SEQ ID 838; ORF129-l>: 

1 MDFRFDII1T YRWMFLYGAL TTLGLT WAT AGGSVLGLLL ALA RLIHLEK 

51 AGAPMRVLAW ALRKVSLLYV TLFRGTP LFV QIVIWAYVWF PFFV HPSDGI 

101 LVSGEAAIAL RRGYGP LIAG SLALIANSGA YIC EIFRAGI QSIDKGQMEA 

151 ARSLGLTYPQ AMRYVILPQA LRRMLPPLAS E FITLLKDS5 LLSVIAVA EL 

201 AYVQNTITGR YSVYEEPLYT VALIYLLMTT FLGWIF LRLE KRYNPQHR* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF129 shows 98.9% identity over a 88aa overlap with an ORF (ORF129a) from strain A of TV. 
meningitidis: 

10 20 30 40 50 

orf 12 9 .pep IIYEYRWMFLYGALTTLGLT WAXAGGSVLGLLLALA RLIHLEKAGAPMRVLAW 
I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 
orf 129a MDFRFDIIYEYRWMFLYGALTTLGLT WATAGGSVLGLLLALA RLIHLEKAGAPMRVLAW 



ALRKV SLLYVTLFRGT P LFVQIVIWAYVWFPFFV 

I I I II I I I I II I I I I I I I I ! I M ! I I I I I 1 I I I I 

ALRKVSLLYVTLFRGTP LFVQIVIWAYTOFPFFV HPSDGILVSGEAAIALRRGYGP LIAG 

70 80 90 100 110 120 



The complete length ORF129a nucleotide sequence <SEQ ID 839> is: 
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101 CGGTATTGGG TCTGTTGTTG GCGTTGGCGC GCCTGATTCA CTTGGAAAAA 

151 GCCGGTGCGC CGATGCGCGT GCTGGCGTGG GCGTTGCGTA AGGTTTCGCT 

201 GCTGTATGTT ACGCTGTTCC GGGGTACGCC GCTGTTTGTG CAGATTGTGA 

251 TTTGGGCGTA TGTGTGGTTT CCGTTTTTCG TCCATCCTTC AGACGGCATT 

301 TTGGTTAGCG GCGAGGCGGC AATCGCGCTG CGTCGCGGAT ACGGGCCGCT 

351 GATTGCCGGT TCTTTGGCAC TGATCGCCAA CTCGGGGGCG TATATCTGTG 

401 AGATTTTCCG CGCGGGCATC CAGTCTATAG ACAAAGGACA GATGGAGGCG 

4 51 GCGCGTTCTT TGGGGCTGAC CTATCCGCAG GCGATGCGCT ATGTGATTCT 

501 GCCGCAGGCA TTGCGCCGTA TGCTGCCGCC TTTGGCGAGC GAGTTCATCA 

551 CGCTCTTGAA AGACAGCTCG CTGCTGTCGG TCATTGCTGT GGCGGAGTTG 

601 GCGTATGTTC AGAATACGAT TACGC-GCCGG TATTCGGTTT ATGAAGAACC 

651 GCTTTACACC GTCGCCCTGA TTTATCTGTT GATGACGACT TTCTTAGGCT 

7 01 GGATATTCCT GCGTTTGGAA AAACGTTACA ATCCGCAACA CCGCTGA 

This encodes a protein having amino acid sequence <SEQ ID 840>: 

1 MDFRFDIIYE YRWMFLYGAL TTLGLT WAT AGGSVLGLLL ALA RLIHLEK 

51 AGAPMRVLAW ALRKVSLLYV TLFRGTP LFV QIVIWAYVWF PFFV HPSDGI 

101 LVSGEAAIAL RRGYGP LIAG SLALIANSGA YIC EIFRAGI QSIDKGQMEA 

151 ARSLGLTYPQ AMRYVILPQA LRRMLPPLAS E FITLLKDSS LLSVIAVA EL 

201 AYVQNTITGR YSVYEEPLYT VALIYLLMTT FLGWIFL RLE KRYNPQHR* 

ORF129a and ORF129-1 show 100.0% identity in 248 aa overlap: 



orf 129a . pep MDFRFDIIYEYRWMFLYGALTTLGLTWATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 
orf 129-1 MDFRFDIIYEYRWMFLYGALTTLGLTVVATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 



orf 129a. pep ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 

orf 129-1 ALRKVSLLYVTLFRGTFLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 

orf 129a. pep SLALIANSGAYICEIFRAGIQSIDKGQMEAARSLGLTYPQAMRYVILPQALRRMLPPLAS 
I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orfl29-l 
orfl29a.] 
orfl29-l 

orf 129a. pep KRYNPQHRX 

orfl29-l KRYNPQHRX 



SLALIANSGAYICEI FRAGIQS IDKGQMEAARSLGLTYPQAMRYVILPQALRRMLPPLAS 
EFITLLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTVALIYLLMTTFLGWIFLRLE 
EFITLLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTVALI YLLMTTFLGWIFLRLE 



Homology with a predicted ORF from N .gonorrhoeae 

ORF129 shows 98.9% identity over a 88 aa overlap with a predicted ORF (ORF129ng) from 
N. gonorrhoeae: 

orf 12 9. pep IIYEYRWMFLYGALTTLGLTWAXAGGSVLGLLLALARLIHLEKAGAPMRVLAW 54 

I I I I I I I I I I I ! I I I I I I I I I I I : I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 
orfl2 9ng MDFRFDIIYEYRWMFLYGALTTLGLTVVATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 60 

orfl29.pep ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFV 88 

I I I I I I I I I I I I I I I I I I I I 

orfl2 9ng ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVILHTAFLGNAMRQSRRVPDKGRWIAG 120 

An ORF129ng nucleotide sequence <SEQ ID 841> was predicted to encode a protein having amino 
acid sequence <SEQ ID 842>: 

1 MDFRFDIIYE YRWMFLYGAL TTLGLT WAT AGGSVLGLLL ALA RLIHLEK 

51 AGAPMRVLAW ALRKVSLLYV TLFRGTPLF V QIVIWAYVWF PFFVIL HTAF 

101 LGNAMRQSRR VPDKGRWIAG SLELNCQPRG RKTRGEFPPG ESNLGTEPRN 

151 PLSMGQRRFP GCENWYPPQN FIKK* 

Further work revealed the following gonococcal sequence <SEQ ID 843>: 



1 ATGGATTTTc gtTTTGACAT TATTTAcgaA TACCGCTGGA TGTTTCTTTA 
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51 CGGCGCACTG Acgaccttgg ggctgacggt cgtggcgacg gCGGGCGGTT 

101 CGGtattggG TCTGTTGTTG GCGTTGGCGC GCCTGATTCA CTTGGAAAAA 

151 GCCGGTGCGC CGATGCGCGT GCTGGCGTGG GCGTTGCGTA AGGTTTCGCT 

201 GCTGTACGTT ACCCTGTTCC GGGGTACGCC GCTGTTTGTG CAGATTGTGA 

2 51 TTTGGGCGTA TGTGTGGTTT CCGTTTTTCG TCCATCCTTC AGACGGCATT 

301 TTGGTCAGCG GCGAGGCGGC AATCGCGCTG CGTCGCGGAT ACGGGCCGCT 

351 GATTGCCGGT TCTTTGGCAC TGATCGCCAA CTCGGGGGCG TATATCTGTG 

4 01 AGATTTTCCG CGCGGGCATC CAGTCTATAG ACAAAGGACA GATGGAGGCG 

4 51 GCGTGTTCTT TGGGACTGAC CTATCCGCAG GCGATGCGCT ATGTGATTCT 

501 GCCGCAGGCA TTGCGCCGTA TGCTGCCGCC TTTGGCGAGC GAGTTCATCA 

551 CGCTCTTGAA AGACAGCTCG CTGCTGTCGG TCATTGCTGT GGCGGAGTTG 

601 GCGTATGTTC AGAATACGAT TACGGGCCGG TATTCGGTTT ATGAAGAACC 

651 GCTTTACACC GCCGCCCTGA TTTATCTGTT GATGACGACT TTCTTAGGCT 

7 01 GGATATTCCT GCGTTTGGAA AAACGTTACA ATCCGCAACA CCGCTGA 

This corresponds to the amino acid sequence <SEQ ID 844; ORF129ng-l>: 



1 MDFRFDIIYE YRWMFLYGAL TTLGLT WAT AGGSVLGLLL ALA RLIHLEK 

51 AGAPMRVLAW ALRKVSLLYV TLFRGTP LFV QIVIWAYVWF PFFV HPSDGI 

101 LVSGEAAIAL RRGYGP LIAG SLALIANSGA YIC EIFRAGI QSIDKGQMEA 

151 ARSLGLTYPQ AMRYVILPQA LRRMLPPLAS E FITLLKDSS LLSVIAVA EL 

201 AYVQNTITGR YSVYEEPLYT VALIYLLMTT FLGWIF LRLE KRYNPQHR* 

ORF129ng-l and ORF129-1 show 99.2% identity in 248 aa overlap: 

orf 129-1. pep MDFRFDIIYEYRWMFLYGALTTLGLTWATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl29ng-l MDFRFDIIYEYRWMFLYGALTTLGLTWATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 

orf 12 9-1. pep ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 

I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I 

orfl29ng-l ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 

orf 129-1. pep SLALIANSGAYICEIFRAGIQSIDKGQMEAARSLGLTYPQAMRYVILPQALRRMLPPLAS 

III I I I I I I I I I I I I I I I I I I I I III I I I I I I I I I I I I I I I 

orfl29ng-l SLALIANSGAYICEIFRAGIQSIDKGQMEAACSLGLTYPQAMRYVILPQALRRMLPPLAS 

orf 12 9-1. pep EFITLLKDSSLLSVIAVAE LA YVQNTITGRYSVYEEPLYTVALIYLLMTT FLGWIFLRLE 

orfl29ng-l EFITLLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTAALIYLLMTTFLGWIFLRLE 

orf 129-1. pep KRYNPQHRX 

II 

orfl29ng-l KRYNPQHRX 

In addition, ORF129ng-l is homologous to an ABC transporter from A.fulgidus: 

2650409 (AE001090) glutamine ABC transporter, permease protein (glnP) 
[Archaeoglobus fulgidus ] Length = 224 
Score = 132 bits (329) , Expect = 2e-30 

Identities = 86/178 (48%), Positives = 103/178 (57%), Gaps = 18/178 (10%) 

Query: 65 VSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAGSLAL 124 

+S YV + RGTPL VQI+I +F P+ GI + E A G +AL 

Sbjct: 58 ISTAYVEVIRGTPLLVQILI VYFGLPAIGINLQPEPA GIIAL 99 

Query: 125 IANSGAYICEIFRAGIQSIDKGQMEAACSLGLTYPQAMRYVILPQALRRMLPPLASEFIT 184 

SGAYI EI RAGI + SI GQMEAA SLG+TY QAMRYVI PQA R +LP L +EFI 
Sbjct: 100 SICSGAYIAEIVRAGIESIPIGQMEAARSLGMTYLQAMRYVIFPQAFRNILPALGNEFIA 159 

Query: 185 LLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTAALIYLLMTTFLGWIFLRLEKR 242 

LLKDSSLLSVI++ EL V I P AL YL+MT L + +K+ 

Sbjct: 160 LLKDSSLLSVISIVELTRVGRQIVNTTFNAWTPFLGVALFYLMMTIPLSRLVAYSQKK 217 

This analysis, including the identification of transmembrane domains in the two proteins, suggests 
that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful 



antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 100 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 845>: 



1 


. . CTGAAAGAAT 


GCCGTCTGAA 


AGACCCTGTT 


TTTATTCCAA 


ATATCGTTTA 


51 


TAAGAACATC 


GCCATTACTT 


TCCTGCTCTT 


GCACGCCGCC 


GCCGAACTTT 


101 


GGCTGCCCGC 


GCAAACCGCC 


GGTTTTACCG 


CGCTCGCCGT 


CGGCTTCATC 


151 


CTGCTCGCCA AGCTGCGTGA gCTTCACCAT 


CACGAACTCT 


TACGTAAACA 


201 


cTACGTCCGC ACTTATTACy 


TGCTCCAACT 


CTTTGCCGCC 


GCAGgcTAgT 


251 


TTGTGGACAG 


GCGCGGCGwA 


ATTACAAAAC 


CTGCCCGCyT 


CCGCGCCCCT 


301 


GCACCTGATT 


ACCCTCGGCG 


GCATGATGGG 


CGGCGTGATG 


ATGGTGTGGc 


351 


TGACCGCCGG 


ACTGTGGCAC 


AGCGGCTTTA 


CCAAACTCGA 


CTACCCCAAA 


401 


CTCTGCCGCA 


TTGCCGTCCC 


CATCCTTTTC 


GCCGCCGCCG 


TCTCGCGCGC 


4 51 


TTTCTTGrTG 


AACGTGAACC 


CGrTATTTTT 


CATTACCGTT 


CCTGCGATTC 


501 


TGACCGCCGC 


CGTATTCGTA 


CTGTATCTTT 


TCrCGTTTAT 


ACCGATATTT 


551 


CGGGCGAATG 


CGTTTACAGA 


CGATCCGGAr 


TAr 





This corresponds to the amino acid sequence <SEQ ID 846; ORF130>: 

1 . . LifECRLKDPV FIPNIVYKNI AITFLLLHAA AELWLPAQTA GFTALAVGFI 

51 LLAKLRELHH HELLRKHYVR TYYLLQLFAA AGSLWTGAAX LQNLPASAPL 

101 HLITLGGMMG GVMMVWLTAG LWHSGFTKLD YPKLCRIAVP ILFAAAVSRA 

151 FLXNVNPXFF ITVPAILTAA VFVLYLFXFI PIFRANAFTD DPE* 

Further work revealed the complete nucleotide sequence <SEQ ID 847>: 

1 ATGCGGCCGT TTTTCGTCGG CGCGGCGGTG CTTGCCATAC TCGGTGCGCT 

51 GGTGTTTTTC ATCAACCCCG GTGCCATCGT CCTGCACCGC CAAATTTTCT 

101 TGGAACTTAT GCTGCCGGCG GCATACGGCG GTTTTTTGAC TGCGGCTTTG 

151 TTGGACTGGA CGGGTTTTTC GGGTAACCTG AAACCTGTCG CGACTTTGAT 

201 GGCGGCATTA TTGCTCGCCG CATCCGCTAT ACTGCCCTTT TCGCCGCAAA 

251 CTGCCTCGTT TTTCGTCGCC GCCTATTGGC TGGTGTTGCT GCTGTTCTGC 

301 GCCCGGCTGA TTTGGCTAGA CCGAAACACC GACAACTTCG CCCTGCTAAT 

351 GTTACTTGCC GCGTTCACTG TTTTTCAGAC GGCATATGCC GTCAGCGGCG 

4 01 ATTTGAACCT GTTGCGCGCG CAAGTGCATC TAAATATGGC GGCGGTGATG 

4 51 TTCGTATCCG TGCGCGTCAG TATTCTTTTG GGCGCGGAAG CCCTGAAAGA 

501 ATGCCGTCTG AAAGACCCTG TTTTTATTCC AAATATCGTT TATAAAAACA 

551 TCGCCATTAC TTTCCTGCTC TTGCACGCCG CCGCCGAACT TTGGCTGCCC 

601 GCGCAAACCG CCGGTTTTAC CGCGCTCGCC GTCGGCTTCA TCCTGCTCGC 

651 CAAGCTGCGT GAGCTTCACC ATCACGAACT CTTACGTAAA CACTACGTCC 

7 01 GCACTTATTA CCTGCTCCAA CTCTTTGCCG CCGCAGGCTA TTTGTGGACA 

7 51 GGCGCGGCGA AATTACAAAA CCTGCCCGCC TCCGCGCCCC TGCACCTGAT 
801 TACCCTCGGC GGCATGATGG GCGGCGTGAT GATGGTGTGG CTGACCGCCG 

8 51 GACTGTGGCA CAGCGGCTTT ACCAAACTCG ACTACCCCAA ACTCTGCCGC 
901 ATTGCCGTCC CCATCCTTTT CGCCGCCGCC GTCTCGCGCG CTTTCTTGAT 
951 GAACGTGAAC CCGATATTTT TCATTACCGT TCCTGCGATT CTGACCGCCG 

1001 CCGTATTCGT ACTGTATCTT TTCACGTTTA TACCGATATT TCGGGCGAAT 

1051 GCGTTTACAG ACGATCCGGA ATAA 

This corresponds to the amino acid sequence <SEQ ID 848; ORF130-1>: 

1 MRPFFVGAAV LAILGALVFF INPGAIVLHR QIFLELMLPA AYGGFLTA AL 

51 LDWTGFSGNL KP VATLMAA1 LLAASAILP F SPQT ASFFVA AYWLVLLLFC 

101 ARLIWLDRNT DNFA LLMLLA AFTVFQTAYA V SGDLNLLRA QVHLN MAAVM 

151 FVSVRVSILL GA EAUCfiCRL KDPVFIPNIV YKN IAITFLL LHAAAELWLP 

201 AQ TAGFTALA VGFILLAKL R ELHHHELLRK HYVRTYYLLQ LFAAAGYLWT 

251 GAAKLQNLPA SAPLH LITLG GMMGGVMMVW LT AGLWHSGF TKLDYPKLCR 

301 IAVPILFAAA VSRAFLM NVN P IFFITVPAI LTAAVFVL YL FTFIPIFRAN 

351 AFTDDPE* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.menineitidis (strain A) 

ORF130 shows 94.3% identity over a 193aa overlap with an ORF (ORF130a) from strain A of N. 
meningitidis: 
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orf 130 .pep 
orfl30a 



LKECRLKDPVFIPNIVYKNIAITFLLLHAA 



orf 130 . pep AELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQLFAAAGSLWTGAAX 



100 110 120 130 140 150 

orf 130 . pep LQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCRIAVPILFAAAVSRA 
I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 130a LQNLPASAPLHLITLGGMMGSVMMVWLTAGLWHSGFTKLDYPKLCRIAVPILFAAAVSRA 
260 270 280 290 300 310 



160 170 180 190 

orf 130 .pep FLXNVNPXFFITVPAILTAAVFVLYLFXFIPIFRANAFTDDPEX 

I I I I I I I I I I I I h: I: I I I I I I I 

o r f 1 3 0 a VLMNVN PIFFITVPAI LTAAVFVLYLLT FVP I FRANAFT DDPEX 

320 330 340 350 

The complete length ORF130a nucleotide sequence <SEQ ID 849> is: 



ATGCGGCCGT 
GGTGTTTTTC 
TGGAACTTAT 
TTGGACTGGA 
GGCGGCATTA 
CTGCCTCGTT 
GCCCGGCTGA 
GTTACTTGCC 
ATTTGAACCT 
TTCGTATCCG 
ATGCCGT CTG 
TCGCCATTAC 
GCGCAAACCG 
CAAGCTGCGT 
GCACTTATTA 
GGCGCGGCGA 
TACCCTCGGT 
GACTGTGGCA 
ATCGCCGTCC 
GAACGTAAAC 
CCGTGTTCGT 
GCGTTTACAG 



TTTTCGTCGG 
ATCAACCCCG 
GCTGCCGGCG 
CGGGTTTTTC 
TTGCTCGCCG 
TTTCGTCGCC 
TTTGGCTAGA 
GCGTTCACTG 
GTTGCGCGCG 
TGCGCGTCAG 
AAAGACCCAG 
CTTCCTGCTC 
CCGGTTTTAC 
GAGCTTCACC 
CCTGCTCCAA 
AATTACAAAA 
GGCATGATGG 
CAGCGGCTTT 
CCATCCTNTT 
CCGATATTCT 
GCTTTACCTG 
ACGATCCGGA 



CGCGGCGGTG 
GTGCCATCGT 
GCATACGGCG 
GGGTAACCTG 
CATCCGCTAT 
GCCTATTGGC 
CCGAAACACC 
TTTTTCAGAC 
CAAGTGCATC 
TATTCTTTTG 
TATTCATCCC 
CTGCACGCCG 
CTCGCTCGCC 
ATCACGAACT 
CTCTTTGCCG 
CCTGCCCGCC 
GCAGCGTGAT 
ACCAAGCTCG 
CGCCGCCGCC 
TCATCACCGT 
CTGACATTCG 
ATAA 



CTTGCCATAC 
CCTGCACCGC 
GTTTTTTGAC 
AAACCTGTCG 
ACTGCCCTTT 
TGGTGTTGCT 
GACAACTTCG 
GGCATATGCC 
TAAATATGGC 
GGCGCGGAAG 
CAATGTCGTC 
CCGCCGAACT 
GTCGGCTTTA 
CCTGCGCAAA 
CCGCAGGCTA 
TCCGCGCCCC 
GATGGTGTGG 
ACTACCCGAA 
GTTTCGCGCG 
CCCCGCAATT 
TACCGATCTT 



TCGGTGCGCT 
CAAATTTTCT 
TGCGGCTTTG 
CGACTTTGAT 
TCGCCGCAAA 
GCTGTTCTCC 
CCCTGCTAAT 
GTCAGCGGCG 
GGCGGTGATG 
CCCTGAAAGA 
TATAAAAACA 
TTGGCTGCCT 
TCCTGCTTGC 
CACTACGTCC 
TTTGTGGACA 
TGCACCTGAT 
CTGACTGCCG 
ACTCTGCCGC 
CTGTTTTAAT 
CTGACCGCCG 
TCGGGCGAAC 



This encodes a protein having amino acid sequence <SEQ ID 850>: 



1 MRPFFVGAAV LAILGALVFF INPGAIVLHR QIFLELMLPA AYGGFLTAA L 

51 LDWTGFSGNL KP VATLMAAL LLAASAILP F SPQT ASFFVA AYWLVLLLFC 

101 ARLIWLDRNT DNFA LLMLLA AFTVFQTAYA V SGDLNLLRA QVHLN MAAVM 

151 FVSVRVSILL GA EALKECRL KDPVFIPNVV YKN IAITFLL LHAAAELWLP 

201 AQ TAGFTSLA VGFILLAKL R ELHHHELLRK HYVRTYYLLQ LFAAAGYLWT 

251 GAAKLQNLPA SAPLH LITLG GMMGSVMMVW LT AGLWHSGF TKLDYPKLCR 

301 IAVPILFAAA VSRAVLM NVN P IFFITVPAI LTAAVFVL YL LTFVPIFRAN 

351 AFTDDPE* 

ORF130a and ORF130-1 show 98.3% identity in 357 aa overlap: 

orf 130a. pep MRPFFVGAAVLAILGALVFFINPGAIVLHRQIFLELMLPAAYGGFLTAALLDWTGFSGNL 

orf 130-1 MRPFFVGAAVLAI LGALVFFINPGAIVLHRQIFLELMLPAAYGGFLTAALLDWTGFSGNL 

orf 130a. pep KPVATLMAALLLAASAILPFSPQTASFFVAAYWLVLLLFCARLIWLDRNTDNFALLMLLA 

I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I II 

orf 130-1 KPVATLMAALLLAA3AILPFSPQTASFFVAAYWLVLLLFCARLIWLDRNTDNFALLMLLA 



orf 130a. pep 



AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVSILLGAEALKECRLKDPVFIPNW 
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YKNIAITFLLLHAAAE LWL PAQTAGFT S LAVGFI LLAKLRELHHHELLRKHYVRT YYLLQ 

YKNIAITFLLLHAAAELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQ 

LFAAAGYLWTGAAKLQNLPASAPLHLITLGGMMGSVMMVWLTAGLWHSGFTKLDYPKLCR 
I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I i I I I I I I I I I I I I I i I I I I I 
LFAAAGYLWTGAAKLQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCR 

IAVPILFAAAVSRAVLMNVNPIFFITVPAILTAAVFVLYLLTFVPIFRANAFTDDPE 

I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I : I I M 

IAVPILFAAAVSRAFLMNVNPIFFITVPAILTAAVFVLYLFTFIPIFRANAFTDDPE 



Homology with a predicted ORF from N. gonorrhoeae 

ORF130 shows 91.7% identity over a 193 aa overlap with a predicted ORF (ORF130ng) from 
N. gonorrhoeae: 

LKECRLKDPVFIPNIVYKNIAITFLLLHAA 30 
I I I I I I I I I I I I I I : : I I I I I I I I I I I I I 
LNLLRAQVHLNMAAVMFVSVRVSVLLGTETLKECRLKDPVFIPNVIYKNIAIT-LLLHAA 201 

AELWLPAQTAGFTALAVGFI LLAKLRELHHHELLRKHYVRT YYLLQLFAAAGSLWTGAAX 90 

AELWLPAQTAGFTALAVGFILIAKLRELHHHELLRKHYVRTYYLLQLFAAAGYLWTGAAK 261 



orfl30.pep 
orfl30ng 
orf 130 .pep 
orf 130ng 
orf 130. pep 
orfl30ng 
orf 130. pep 
orfl30ng 



LQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCRIAVPILFAAAVSRA 
I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I II : I I I I I 
LQNLPASAPLHLITLGGMTGGVMMVWLTAGLWHSGFTKLDYPKLCRIAVSILFASAVSRA 



150 
321 



FLXNVNPXFFITVPAILTAAVFVLYLFXFI PIFRANAFTDDPE 193 

I I I I I I I I I I I I I I I I I I : I I I :: I : I I I I I I I I I I I I I 
VLMNVNP I FFITVPE I LTAAVFMLYLLTFVPI FRANAFTD DPE 3 6 4 

An ORF130ng nucleotide sequence <SEQ ID 85 1> was predicted to encode a protein having amino 
acid sequence <SEQ ID 852>: 



1 MNKFFTHPM? PFFVGA AVLA I LGALVFFHQ PRRYHPAPPN FLGTYAAGCI 

51 RRFFDYRFVG PDGFFRQPET CRYFDG GWA CCGCFIAVFT ATC RIFRRRL 

101 LAGVAAVLRL ADLARRQHRT LRSVDVTAAF TVFQTAYAVS GDLNLLRAQV 

151 H LNMAAVMFV SVRVSVLL GT ETLKECRLKD P VFIPNVIYK NIAITLLL HA 

201 AAELWLPAQ T AGFTALAVGF ILLAKL RELH HHELLRKHYV RTYYLLQLFA 

251 AAGYLWTGAA KLQNLPASAP LHLITLGGMT GGVMMVWLTA GLWHSGFTKL 

301 DYPKLCR IAV SILFASAVSR AVLM NVMPIF FITVPE ILTA AVFMLYLLTF 

351 VPIFRANAFT DDPE* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 853>: 



ATGCGCCCGT 
GGTGTTTTTT 
TGGAACTTAT 
TTGGACCGGA 
GGCGGTGTTG 
TTGCCGCATT 
GCCTGGCTGA 
GTTACTTGCC 
ATTTGAACTT 
TTCGTATCCG 
ATGCCGTCTG 
TCGCCATCAC 
CAAACCGCCG 
GCTGCGCGAA 
CTTATTACCT 
GCGGCGAAAC 
CCTCGGCGGC 
TGTGGCACAG 



TTTTCGTCGG 
ATCAACCCCG 
GCTGCCGGCT 
CGGGTTTTTC 
TTGCTTGTTG 
TTTCGTCGCC 
TTTGGCTCGA 
GCATTTACCG 
ACTGCGCGCG 
TCCGCGTCAG 
AAAGACCCCG 
CCTGCTGCTG 
GTTTTACTGC 
CTGCACCATC 
GCTCCAGCTC 
TGCAAAACCT 
ATGACGGGTG 
CGGCTTTACC 



TGCGGCAGTA 
GCGCTATCAT 
GCATACGGCG 
AGGCAACCTG 
CGGCTGTTTT 
GCCTATTGGC 
CCGCAACACC 
TTTTTCAGAC 
CAAGTGCATT 
CGTCCTTTTG 
TATTCATCCC 
CACGCCGCCG 
GCTTGCCGTC 
ACGAACTCTT 
TTTGCCGCCG 
GCCCGCCTCC 
GCGTGATGAT 
AAACTCGACT 



CTTGCCATAC 
CCTGCACCGC 
GTTTTTTGAC 
AAACCTGCCG 
ATTGCCGTTT 
TGGTGTTGCT 
GACAACTTCG 
GGCCTATGCC 
TGAATATGGC 
GGCACGGAAA 
CAACGTTATC 
CCGAACTTTG 
GGCTTCATCC 
ACGCAAACAC 
CAGGTTATCT 
GCGCCCCTGC 
GGTGTGGCTG 
ACCCGAAACT 



TCGGTGCGTT 
CAAATTTTCT 
TACCGCTTTG 
CTACTTTGAT 
TTACCGCAAC 
GCTGTTCTGC 
CTCTGTTGAT 
GTCAGCGGCG 
GGCGGTCATG 
CCCTGAAAGA 
TATAAAAACA 
GCTGCCCGCG 
TGCTCGCCAA 
TACGTCCGCA 
GTGGACAGGC 
ACCTGATTAC 
ACTGCCGGAC 
CTGCCGCATC 
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901 GCCGTCTCCA TCCTTTTCGC CTCCGCCGTT TCGCGCGCTG TTTTAATGAA 

951 CGTGAATCCG ATATTCTTCA TCACCGTTCC CGAGATTCTG ACCGCCGCCG 

1001 TGTTCATGCT TTACCTGCTG ACGTTCGTAC CGATTTTTCG AGCGAACGCG 

1051 TTTACAGACG ATCCGGAATA A 

This corresponds to the amino acid sequence <SEQ ID 854; ORF130ng-l>: 

1 MRPF FVGAAV LAILGALVFF I NPGAIILHR QIFLELMLPA AYGGFLTTAL 

51 LDRTGFSGNL KPA ATLMAVL LLVAAVLLPF L PQ LAAFFVA AYWLVLLLFC 

101 AWLIWLDRNT DNFA LLMLLA AFTVFQTAYA V SGDLNLLRA QVH LNMAAVM 

151 FVSVRVSVLL GTETLKECRL KDP VFIPNVI YKNIAITLLL HAAAELWLPA 

201 Q TAGFTALAV GFILLAKL RE LKHHELLRKH YVRTYYLLQL FAAAGYLWTG 

251 AAKLQNLPAS APLHLITLGG MTGGVMMVWL TAGLWHSGFT KLDYPKLCRI 

301 AVSILFASAV SRAVLM NVNP IFFITVPE IL TAAVFMLYLL TFVPI FRANA 

351 FTDDPE* 

ORF130ng-l and ORF130-1 show 92.4% identity in 357 aa overlap: 

orf 130-1 . pep MRPFFVGAAVLAILGALVFFINPGAIVLHRQIFLELMLPAAYGGFLTAALLDWTGFSGNL 

orf 130ng-l MRPFFVGAAVLAILGALVFFINPGAIILHRQIFLELMLPAAYGGFLTTALLDRTGFSGNL 

orf 130-1. pep KPVATLMAALLLAASAILPFSPQTASFFVAAYWLVLLLFCARLIWLDRNTDNFALLMLLA 
I I : I I I I I : I I I : I : : : I I I II I : I I I I I I I I I I I I I I I I I I I I II I I I I 1 I I I t I I 
orfl30ng-l KPAATLMAVLLLVAAVLLPFLPQLAAFFVAAYWLVLLLFCAWLIWLDRNTDNFALLMLLA 

orf 130-1. pep AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVSILLGAEALKECRLKDPVFIPNIV 

orfl30ng-l AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVSVLLGTETLKECRLKDPVFIPNVI 

orf 130-1. pep YKNIAITFLLLHAAAELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQ 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II 

orfl30ng-l YKNIAIT-LLLHAAAELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQ 

orf 130-1 . pep LFAAAGYLWTGAAKLQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCR 

I I I I II I I I II I I II I I Ill I I I I I I I 

orfl30ng-l LFAAAGYLWTGAAKLQNLPASAPLHLITLGGMTGGVMMVWLTAGLWHSGFTKLDYPKLCR 

orf 130-1. pep IAVPILFAAAVSRAFLMNVNPIFFITVPAILTAAVFVLYLFTFIPIFRANAFTDDPEX 

orfl30ng-l IAVSILFASAVSRAVLMNVNPIFFITVPEILTAAVFMLYLLTFVPIFRANAFTDDPEX 

Based on this analysis, it is predicted that the proteins from N, meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 101 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 85 5>: 

1 ATGGAAATTC GGGCAATAAA ATATACGGCA ATGGCTGCGT TGCTTGCATT 

51 TACGGTTGCA GGCTGCCGGC TGGCGGGGTG GTATGAGTGT TCGTCCCTCA 

101 CCGGCTGGTG TAAGCCGAGA AAACCGGCTG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GTCCGCCGTC TTTAGGGGAC TACGAGATAC CGCTTTCAGA 

201 CGGCAATAGT TCCGTCAGGG CAAACGAATA TGAATCCGCA CAACAATCTT 

251 ACTTTTACAG GAAAATAGGG AAGTTTGAAG C . TGCGGGCT GGATTGGCGT 

301 ACGCGTGACG GCAAACCTTT GATTGAGACG TTCAAACAGG GAGGATTTGA 

351 CTGCTTGGAA AAG. . 

This corresponds to the amino acid sequence <SEQ ID 856; ORF131>: 

1 MEIRAIKYTA MAALLAFTVA GCRLAGWYEC SSLTGWCKPR KPAAIDFWDI 
51 GGESPPSLGD YEIPLSDGMS SVRANEYESA QQSYFYRKIG KFEXCGLDWR 
101 TRDGKPLIET FKQGGFDCLE K. . 

Further work revealed the complete nucleotide sequence <SEQ ID 857>: 

1 ATGGAAATTC GGGCAATAAA ATATACGGCA ATGGCTGCGT TGCTTGCATT 
51 TACGGTTGCA GGCTGCCGGC TGGCGGGGTG GTATGAGTGT TCGTCCCTCA 
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101 CCGGCTGGTG TAAGCCGAGA AAACCGGCTG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GTCCGCCGTC TTTAGGGGAC TACGAGATAC CGCTTTCAGA 

201 CGGCAATCGT TCCGTCAGGG CAAACGAATA TGAATCCGCA CAACAATCTT 

251 ACTTTTACAG GAAAATAGGG AAGTTTGAAG CCTGCGGGCT GGATTGGCGT 

301 ACGCGTGACG GCAAACCTTT GATTGAGACG TTCAAACAGG GAGGATTTGA 

351 CTGCTTGGAA AAGCAGGGGT TGCGGCGCAA CGGTCTGTCC GAGCGCGTCC 

401 GATGGTAA 

This corresponds to the amino acid sequence <SEQ ID 858; ORF131-l>: 



1 MEIRAIKYTA MAALLAFTVA G CRLAGWYEC SSLTGWCKPR KPAAIDFWDI 
51 GGESPPSLGD YEIPLSDGNR SVRANEYESA QQSYFYRKIG KFEACGLDWR 
101 TRDGKPLIET FKQGGFDCLE KQGLRRNGLS ERVRW* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis ("strain A) 

ORF131 shows 95.0% identity over a 121aa overlap with an ORF (ORF131a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orfl31.pep MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 

orfl31a MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLSGWCKPRKPAAIDFWDIGGESPPSLED 
10 20 30 40 50 60 

70 80 90 100 110 120 

orfl31.pep YEIPLSDGNSSVRANEYESAQQSYFYRKIGKFEXCGLDWRTRDGKPLIETFKQGGFDCLE 

Ill I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Mill: 

orfl31a YEIPLSDGNRSVRANEYESAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQEGFDCLK 

70 80 90 100 110 120 



K 

KQGLRRNGLSERVRWX 
130 

The complete length ORF13 la nucleotide sequence <SEQ ID 859> is: 

1 ATGGAAATTC GGGCAATAAA ATATACGGCA ATGGCTGCGT TGCTTGCATT 

51 TACGGTTGCA GGCTGCCGGT TGGCAGGTTG GTATGAGTGT TCGTCCCTGT 

101 CCGGCTGGTG TAAGCCGAGA AAACCTGCCG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GTCCTCCGTC TTTAGAGGAC TACGAGATAC CGCTTTCAGA 

201 CGGCAATCGT TCCGTCAGGG CAAACGAATA TGAATCCGCA CAACAATCTT 

251 ACTTTTACAG GAAAATAGGG AAGTTTGAAG CCTGCGGGTT GGATTGGCGT 

301 ACGCGTGACG GCAAACCTTT GATTGAGACG TTCAAACAGG AAGGTTTTGA 

351 TTGTTTGAAA AAGCAGGGGT TGCGGCGCAA CGGTCTGTCC GAGCGCGTCC 

401 GATGGTAA 

This encodes a protein having amino acid sequence <SEQ ID 860>: 

1 MEIRAIKYTA MAALLAFTVA G CRLAGWYEC SSLSGWCKPR KPAAIDFWDI 
51 GGESPP3LED YEIPLSDGNR SVRANEYESA QQSYFYRKIG KFEACGLDWR 
101 TRDGKPLIET FKQEGFDCLK KQGLRRNGLS ERVRW* 

ORF131a and ORF131-1 show 97.0% identity in 135 aa overlap: 

orf 131a . pep MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLSGWCKPRKPAAIDFWDIGGESPPSLED 

orf 131-1 MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 

orfl31a.pep YE I PLS DGNRSVRANEYESAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQEGFDCLK 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II : 

orf 131-1 YEIPLSDGNRSVRANEYESAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQGGFDCLE 



orf 131. pep 
orfl31a 



>rfl31a.pep 



KQGLRRNGLSERVRWX 
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orfl31-l KQGLRRNGLSERVRWX 

Homology with a predicted ORF from N. gonorrhoeae 

ORF131 shows 89.3% identity over 121 aa overlap with a predicted ORF (ORF131ng) from 
N. gonorrhoeae: 



orfl31.pep 
orf 131ng 
orfl31.pep 
orf 131ng 
orf 131. pep 
orf!31ng 



MEIFAJKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 

MEIRVIKYTATAALFAFTVAGCRLAGWYECLSLSGWCKPRKPAAIDFWDIGGESPLSLED 

YEIPLSDGN3SVRANEYESAQQSYFYRKIGKFEXCGLDWRTRDGKPLIETFKQGGFDCLE 
I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I : I III I I I! I I 
YEIPLSDGNRSVRANEYESAQKSYFYRKIGKFEACGLDWRTRDGKPLVERFKQEGFDCLE 



KQGLRRNGLSERVRW 134 

A complete length ORF131ng nucleotide sequence <SEQ ID 861> was predicted to encode a 
protein having amino acid sequence <SEQ ID 862>: 



1 MEIRVIKYTA TAALFAFTVA GC RLAGWYEC LSLSGWCKPR KPAAIDFWDI 
51 GGESPLSLED YE1PLSDGNR SVRANEYESA QKSYFYRKIG KFEACGLDWR 
101 TRDGKPLVER FKQEGFDCLE KQGLRRNGLS ERVRW* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 863>: 



1 ATGGAAATTC GGGTAATAAA ATATACGGCA ACGGCTGCGT TGTTTGCATT 

51 TACGGTTGCA GGCTGCCGGC TGGCGGGGTG GTATGAGTGT TCGTCCTTGT 

101 CCGGCTGGTG TAAGCCGAGA AAACCTGCCG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GtCCgctGTC TTTAGAGGAC TACGAGATAC CGCTTTCAGA 

201 CGGCAATCGT TCCGTCAGGG CAAACGAATA TGAATCCGCG CAAAAATCTT 

251 ACTTTTATAG GAAAATAGGG AAGTTTGAAG CCTGCGGGTT GGATTGGCGT 

301 ACGCGTGACG GCAAACCTTT GGTTGAGAGG TTCAAACAGG AAGGTTTCGA 

351 CTGTTTGGAA AAGCAGGGGT TGCGGCGCAA CGGCCTGTCC GAGCGCGTCC 

401 GATGGTAA 

This corresponds to the amino acid sequence <SEQ ID 864; ORF131ng-l>: 



1 MEIRVIKYTA TAALFAFTVA G CRLAGWYEC SSLSGWCKPR KPAAIDFWDI 
51 GGESPLSLED YEIPLSDGNR SVRANEYESA QKSYFYRKIG KFEACGLDWR 
101 TRDGKPLVER FKQEGFDCLE KQGLRRNGLS ERVRW* 

ORF131ng-l and ORF131-1 show 92.6% identity in 135 aa overlap: 



orf 131ng-l . pep MEIRVIKYTATAALFAFTVAGCRLAGWYECSSLSGWCKPRKPAAIDFWDIGGESPLSLED 
orf 131-1 MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 



orf 131ng-l .pep YEIPLSDGNRSVRANEYESAQKSYFYRKIGKFEACGLDWRTRDGKPLVERFKQEGFDCLE 
I I I I I I I I I I I II I I I I I I I I : I I I I I I II I I I I I I I I I I I ! I I I I I : I III I I I I I I 
orf 131-1 YEIPLSDGNRSVRANEYSSAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQGGFDCLE 

orfl31ng-l.pep KQGLRRNGLSERVRWX 
I I I I I I I I I I I I I I I I 
orfl31-l KQGLRRNGLSERVRWX 

Based on the presence of a predicted prokaryotic membrane lipoprotein lipid attachment site, it is 
predicted that the proteins from N. meningitidis and ^.gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 102 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 865> 

1 ATGAAACACA TCCATATTAT CGGTATCGGC GGCACGTTTA TGGGCGGGCT 

51 TGCCGCCATT GCCAAAGAAG CGGGGTTTGA AGTCAGCGGT TGCGACGCGA 

101 AGATGTATCC GCCGATGAGC ACCCAGCTCG AAGCCTTGGG TATAGACGTG 

151 TATGAAGGCT TCGATGCCGC TCAGTTGGAC GAATTTAAAG CCGACGTTTA 

201 CGTTATCGGC AATGTCGCCA AGCGCGGGAT GGATGTGGTT GAAGCGATTT 

251 TGAACCTCGG CCTGCCtTAT ATtTcCGGCC CGCAATGGCT GTCGGAAAAC 

301 GTGCTGCACC ATCATTGGGT ACTCGGTGTG GCGGGGACgC ACGGCAAAAC 

351 GACCACCGCC TCCATGCTCG CATGGGTCTT GGAATATgCC GGCCTCGCGC 

401 CGGGCTTCCT TATtGGCGGC GTACC . GGAA AATttCGGCG TTTCCGCCCG 

451 CCTGCCGCAA ACGCCGCGCC AAGACCCGAA CAGCCAATCG CCGTTTTTcG 

501 TCATCGAAGC CGACGAATAC GACACCGCCT TTtTCGACAA ACGTTCTAAA 

551 TtCGTGCATT ACCGTCCGCG TACCGCCGTG TTGAACAATC TGGAATTCGA 

601 CCACGCCGAC ATCTTTGCCG ACTTGGGCGC GATACAGACc CAGTTCCACT 

651 ACCTCGTGCG TACCGTGCCG TCTGAAGGCT TAATCGTCTG CAACGGACGG 

701 CAGCAAAGCC TGCAAGATAC TTTGGACAAA GGCTGCTGGA CGCCGGTGGA 

751 AAAATTCGGC ACGGAACACG GCTGGCA. . 

This corresponds to the amino acid sequence <SEQ ID 866; ORF132>: 

1 MKHIHIIGIG GTFMGGLAAI AKEAGFEVSG CDAKMYPPMS TQLEALGIDV 

51 YEGFDAAQLD EFKADVYVIG NVAKRGMDW EAILNLGLPY ISGPQWLSEN 

101 VLHHHWVLGV AGTHGKTTTA SMLAWVLEYA GLAPGFLIGG VXGKFRRFRP 

151 PAANAAPRPE QPIAVFRHRS RRIRHRLFRQ TFXIRALPSA YRRVEQSGIR 

201 PRRHLCRLGR DTDPVPLPRA YRAVXRLNRL QRTAAKPARY FGQRLLDAGG 

251 KIRHGTRLA. . 

Further work revealed the complete nucleotide sequence <SEQ ID 867>: 

1 ATGAAACACA TCCATATTAT CGGTATCGGC GGCACGTTTA TGGGCGGGCT 

51 TGCCGCCATT GCCAAAGAAG CGGGGTTTGA AGTCAGCGGT TGCGACGCGA 

101 AGATGTATCC GCCGATGAGC ACCCAGCTCG AAGCCTTGGG TATAGACGTG 

151 TATGAAGGCT TCGATGCCGC TCAGTTGGAC GAATTTAAAG CCGACGTTTA 

201 CGTTATCGGC AATGTCGCCA AGCGCGGGAT GGATGTGGTT GAAGCGATTT 

251 TGAACCTCGG CCTGCCTTAT ATTTCCGGCC CGCAATGGCT GTCGGAAAAC 

301 GTGCTGCACC ATCATTGGGT ACTCGGTGTG GCGGGGACGC ACGGCAAAAC 

351 GACCACCGCC TCCATGCTCG CATGGGTCTT GGAATATGCC GGCCTCGCGC 

401 CGGGCTTCCT TATTGGCGGC GTACCGGAAA ATTTCGGCGT TTCCGCCCGC 

451 CTGCCGCAAA CGCCGCGCCA AGACCCGAAC AGCCAATCGC CGTTTTTCGT 

501 CATCGAAGCC GACGAATACG ACACCGCCTT TTTCGACAAA CGTTCTAAAT 

551 TCGTGCATTA CCGTCCGCGT ACCGCCGTGT TGAACAATCT GGAATTCGAC 

601 CACGCCGACA TCTTTGCCGA CTTGGGCGCG ATACAGACCC AGTTCCACTA 

651 CCTCGTGCGT ACCGTGCCGT CTGAAGGCTT AATCGTCTGC AACGGACGGC 

701 AGCAAAGCCT GCAAGATACT TTGGACAAAG GCTGCTGGAC GCCGGTGGAA 

751 AAATTCGGCA CGGAACACGG CTGGCAGGCC GGCGAAGCCA ATGCCGACGG 

801 CTCGTTCGAC GTGTTGCTCG ACGGCAAAAC CGCCGGACGC GTCAAATGGG 

851 ATTTGATGGG CAGGCACAAC CGCATGAACG CGCTCGCCGT CATTGCCGCC 

901 GCGCGTCATG TCGGTGTCGA TATTCAGACC GCCTGCGAAG CCTTGGGCGC 

951 GTTTAAAAAC GTCAAACGCC GGATGGAAAT CAAAGGCACG GCAAACGGCA 

1001 TCACCGTTTA CGACGACTTC GCCCACCACC CGACCGCCAT CGAAACCACG 

1051 ATTCAAGGTT TGCGCCAACG CGTCGGCGGC GCGCGCATCC TCGCCGTCCT 

1101 CGAACCGCGT TCCAACACGA TGAAGCTGGG CACGATGAAG TCCGCCCTGC 

1151 CTGTAAGCCT CAAAGAAGCC GACCAAGTGT TCTGCTACGC CGGCGGCGTG 

1201 GACTGGGACG TCGCCGAAGC CCTCGCGCCT TTGGGCGGCA GGCTGAACGT 

1251 CGGCAAAGAC TTCGATGCCT TCGTTGCCGA AATCGTGAAA AACGCCGAAG 

1301 TAGGCGACCA TATTTTGGTG ATGAGCAACG GCGGTTTCGG CGGAATACAC 

1351 GGAAAGCTGC TGGAAGCTTT GAGATAG 

This corresponds to the amino acid sequence <SEQ ID 868; ORF132-l>: 

1 MKHIHIIGIG GTFMGGLAAI A KEAGFEVSG CDAKMYPPMS TQLEALGIDV 

51 YEGFDAAQLD EFKADVYVIG NVAKRGMDW EAILNLGLPY ISGPQWLSEN 

101 VLHHHWVLGV AGTHGKTTTA SMLAWVLEYA GLAPGFLIGG VPENFGVSAR 

151 LPQTPRQDPN SQSPFFVIEA DEYDTAFFDK RSKFVHYRPR TAVLNNLEFD 

201 HADIFADLGA IQTQFHYLVR TVPSEGLIVC NGRQQSLQDT LDKGCWTPVE 

251 KFGTEHGWQA GEANADGSFD VLLDGKTAGR VKWDLMGRHN RMNALAVIAA 

301 ARHVGVDIQT ACEALGAFKN VKRRMEIKGT ANGITVYDDF AHHPTAIETT 
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351 IQGLRQRVGG ARILAVLEPR SNTMKLGTMK SALPVSLKEA DQVFCYAGGV 
4 01 DWDVAEALAP LGGRLNVGKD FDAFVAEIVK NAEVGDHILV MSNGGFGGIH 
4 51 GKLLEALR* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with the hypothetical o457 protein of E.coli (accession number U14003) 
ORF132 and o457 show 58% aa identity in 140 aa overlap: 

Orfl32: 4 IHIIGIGGT FMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLDEFK 63 

IHI+GI GTFMGGLA +A++ G EV+G DA +YPPMST LE GI++ +G+DA+QL+ + 
o457: 3 IHILGICGT FMGGLAMLARQLGHEVTGSDANVYPPMSTLLEKQGIELIQGYDASQLEP-Q 61 

Orfl32: 64 ADVYVIGNVAKRGMDWEAILNLGLPYISGPQWLSENVLHHHWVLGVAGTHGKTTTASML 123 

D+ +IGN RG VEA+L +PY+SGPQWL + VL WVL VAGTHGKTTTA M 
0457: 62 PDLVIIGNAMTRGNPCVEAVLEKNIPYMSGPQWLHDFVLRDRWVLAVAGTHGKTTTAGMA 121 

Orfl32: 124 AWVLEYAGLAPGFLIGGVXG 143 

W+LE G PGF+IGGV G 
o457: 122 TWILEQCGYKPGFVIGGVPG 141 



Homology with a predicted ORF from N. meningitidis fstrain A) 

ORF132 shows 74.6% identity over a 189aa overlap with an ORF (ORF132a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 132 . pep MKHIHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 

II II I III III III!:! II Ml MM MMMM I I I I I I I I I I ] I I : M I I 

orf 132a MKHIHIIGIGGTFMGGIAAIAKEAGFF.XSGCDAKMYPPMSTQLEALGIGVYEGFDTAQLD 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf 132. pep EFKADVYVIGNVAKRGMDWEAILKLGLPYISGPQWLSENVLHHHWVLGVAGTHGKTTTA 

I I I I I I M I I M M M M I M M I I I M I M I I I I I M I I II I I I I I I I ! I M I I I 

orf 132a EFKADVYVIGNVAKRGMDWEAILNRGLPYISGPQWLAENXLHHHWXLGVAXTHGKTTTA 
70 80 90 100 110 120 

130 140 150 160 

orf 132 . pep SMIAWLEYAGLAPGFLIGGVXGKFR---RFRPPAANAAPRPEQPI AVFR 

I I I I I I II II I I I I I I I I I I : I 1:1: I : : I : II 
orf 132a SMLAWVLEYAGLAPGFXIGGVPENFSVSARL-PQTPRQDPNSQSPFFVIEADEYDTAFFD 

130 140 150 160 170 



170 180 190 200 210 220 

HRSRRIRHRLFRQTFXIRALPSAYRRVEQSGIRPRRHLCRLGRDTDPVPLPRAYRAVXRL 

KRSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHHLVRTVPSEGLIVCNGRQQSLQD 
180 190 200 210 220 230 

The complete length ORF 132a nucleotide sequence <SEQ ID 869> is: 



orf 132. pep 
orfl32a 



1 ATGAAACACA TCCACATTAT CGGTATCGGC GGCACGTTTA TGGGTGGGAT 

51 TGCCGCCATT GCCAAAGAAG CAGGGTTTGA ANTCAGCGGT TGCGATGCGA 

101 AGATGTATCC GCCGATGAGC ACCCAGCTCG AAGCCTTGGG CATAGGCGTG 

151 TATGAAGGCT TCGACACCGC GCAGTTGGAC GAATTTAAAG CCGACGTTTA 

201 CGTTATCGGC AATGTCGCCA AGCGCGGGAT GGATGTGGTT GAAGCGATTT 

251 TGAACCGTGG GCTGCCTTAT ATTTCCGGCC CGCAATGGCT GGCTGAAAAC 

301 NTGCTGCACC ATCATTGGNN ACTCGGCGTG GCGGNGACGC ACGGCAAAAC 

351 GACCACCGCG TCTATGCTCG CGTGGGTTTT GGAATATGCC GGACTCGCAC 

4 01 CGGGCTTCNT TATCGGCGGC GTACCGGAAA ACTTCAGCGT TTCCGCCCGC 

451 CTGCCGCAAA CGCCGCGCCA AGACCCGAAC AGCCAATCGC CGTTTTTCGT 

501 CATTGAAGCC GACGAATACG ACACCGCGTT TTTCGACAAA CGCTCCAAAT 

551 TCGTGCATTA CCGTCCGCGT ACCGCCGTGT TGAACAATCT GGAATTCGAC 

601 CACGCCGACA TCTTCGCCGA TTTGGGCGCG ATACAGACCC AGTTCCACCA 

651 CCTCGTGCGT ACCGTGCCGT CTGAAGGCCT CATCGTCTGC AACGGACGGC 

7 01 AGCAAAGCCT GCAAGACACT TTGGACAAAG GCTGCTGGAC GCCGGTGGAA 

751 AAATTCGGCA CGGAACACGG CTGGCAGGCC GGCGAAGCCA ATGCCGATGG 
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1151 
1201 
1251 
1301 
1351 



CTCGTTCGAC 
GTTTGATGGG 
GCGCGTCATG 
GTTTAAAAAC 
TCACCGTTTA 
ATTCAAGGTT 
CGAACCGCGT 
CCGCAAGCCT 
GACTGGGACG 
CGGCAAAGAC 
CAGGCGACCA 
ACCAAACTGC 



GTGTTGCTTG 
CGGACACAAC 
CCGGAGTNGA 
GTCAAACGCC 
CGACGACTTC 
TGCGCCAGCG 
TCCAATACGA 
CAAAGAAGCC 
TTGCCGAAGC 
TTCGATGCCT 
TATTTTGGTG 
TGGACGCTTT 



ACGGCAAAAA 
CGCATGAACG 
CATTCAGACG 
GCAT GGAAAT 
GCCCACCATC 
CGTCGGCGGC 
TGAAGCTGGG 
GACCAAGTGT 
CCTCGCGCCT 
TCGTTGCCGA 
ATGAGCAACG 
GAGATAG 



AGCCGGACAC 
CGCTCGCNGT 
GCCTGCGAAG 
CAAAGGCACG 
CGACCGCTAT 
GCGCGCATCC 
TACGATGAAA 
TCTGNTACGC 
TTGGGCGGCA 
AATCGTGAAA 
GCGGTTTCGG 



GTCGCTTGGA 
CATCGCCGCC 
CCTTGAGCAC 
GCAAACGGTA 
CGAAACCACG 
TCGCCGTCCT 
GCCGCCCTGC 
CGGCGGCGCG 
GGCTGCACGT 
AACGCCGAAG 
CGGAATACAC 



This encodes a protein having amino acid sequence <SEQ ID 870>: 



1 MKHIHIIGIG GTFMGGIAAI A KEAGFEXSG CDAKMYPPMS TQLEALGIGV 

51 YEGFDTAQLD EFKADVYVIG NVAKRGMDW EAILNRGLPY ISGPQWLAEN 

101 XLHHHWXLGV AXTHGKTTTA SMLAWVLEYA GLAPGFXIGG VPENFSVSAR 

151 LPQTPRQDPN SQSPFFVIEA DEYDTAFFDK RSKFVHYRPR TAVLNNLEFD 

201 HADIFADLGA IQTQFHHLVR TVPSEGLIVC NGRQQSLQDT LDKGCWTPVE 

251 KFGTEHGWQA GEANADGSFD VLLDGKKAGH VAWSLMGGHN RMNALAVIAA 

301 ARHAGVDIQT ACEALSTFKN VKRRMEIKGT ANGITVYDDF AHHPTAIETT 

351 IQGLRQRVGG ARILAVLEPR SNTMKLGTMK AALPASLKEA DQVFXYAGGA 

401 DWDVAEALAP LGGRLHVGKD FDAFVAEIVK NAEAGDHILV MSNGGFGGIH 

451 TKLLDALR* 

ORF132a and ORF132-1 show 93.9% identity in 458 aa overlap: 



orf 132a. pep MKHIHIIGIGGTFMGGIAAIAKEAGFEXSGCDAKMYPPMSTQLEALGIGVYEGFDTAQLD 

orf 132-1 MKHIHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 

orf 1 32a . pep EFKADVYVIGNVAKRGMDWEAILNRGLPYISGPQWLAENXLHHHWXLGVAXTHGKTTTA 

orf 132-1 EFKADVYVIGNVAKRGMDWEAILNLGLPYISGPQWLSENVLHHHWVLGVAGTHGKTTTA 

orf 132a . pep SMLAWVLEYAGLAPGFXIGGVPENFSVSARLPQTPRQDPNSQSPFFVIEADEYDTAFFDK 
I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I i I I I I I I I I I I I I I M I I I I I I 

orfl32-l SMLAWVLEYAGLAPGFLIGGVPENFGVSARLPQTPRQDPNSQS PFFVIEADEYDTAFFDK 

orf 132a . pep RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHHLVRTVPSEGLIVCNGRQQSLQDT 

I I I I I I I I I I I I I I I I I I II I I I I I I : I I I I I I I I I I I I 

orf 132-1 RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHYLVRTVPSEGLIVCNGRQQSLQDT 

orf 132a . pep LDKGCWTPVEKFGTEHGWQAGEANADGSFDVLLDGKKAGHVAWSLMGGHNRMNALAVIAA 
I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I t I I I I I I : I hill I I I I I I I I I I I I 
orfl32-l LDKGCWT PVEKFGTEHGWQAGEANADGSFDVLLDGKTAGRVKWDLMGRHNRMNALAVIAA 

orf 132a . pep ARHAGVD IQTACEALST FKN VKRRME I KGT ANG I T V Y D D FAHH PT A IETTIQGLRQRVGG 

Orf 132-1 ARHVGVDIQTACEALGAFKNVKRRMEIKGTANGITVYDDFAHHPTAIETTIQGLRQRVGG 

orf 132a . pep ARILAVLEPRSNTMKLGTMKAALPASLKEADQVFXYAGGADWDVAEALAPLGGRLHVGKD 
I I I I I I I I I I I I I I I I I I I I : I I I : I I I I I I I I I I I I I : I I I I I I I I I I I I I I I : I I I I 
orf 132-1 ARILAVLEPRSNTMKLGTMKSALPVSLKEADQVFCYAGGVDWDVAEALAPLGGRLNVGKD 

orf 132a . pep FDAFVAEIVKNAEAGDHILVMSNGGFGGIHTKLLDALRX 
I I I I I I I I I I I I I : I I I I II I I I I I I I I I I 111:1111 
orf 132-1 FDAFVAEIVKNAEVGDHILVMSNGGFGGIHGKLLEALRX 



Homology with a predicted ORF from N .gonorrhoeae 

ORF132 shows 89.6% identity over 259 aa overlap with a predicted ORF (ORF132ng) from N. 
gonorrhoeae: 

orf 132 .pep MKHIHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 60 
orfl32ng MKHIHIIGIGGTFMGGIAAIAKEAGFKVSGCDAKMYPPMSTQLEALGIGVHEGFDAAQLE 60 
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orfl32.pep 

orfl32ng 

orfl32.pep 

orfl32ng 

orfl32.pep 

orfl32ng 

orfl32.pep 

orfl32ng 



EFKADVYVIGNVAKRGMDWEAILNLGLPYISGPQWLSENVLHHHWVLGVAGTHGKTTTA 

EFQADIYVIGNVARRGMDWEAILNRGLPYISGPQWLAENVLHHHWVLGVAGTHGKTTTA 

SMLAWVLEYAGLAPGFLIGGVXGKFRRFRPPAANAAPRPEQPIAVFRHRSRRIRHRLFRQ 

SMLAWVLEYAGLAPGFLIGGVPGKFRRFRPPTANAASRPEQQIAVFRHRSRRIRHRLFRQ 

TFXIRALPSAYRRVEQSGIRPRRHLCRLGRDTDPVPLPRAYRAVXRLNRLQRTAAKPARY 

TLQIRALSPAYRRVEQSGIRPRRHLRRLGRDTDPVPPPRAHRTIRRFHRLQRTAAKPARY 

FGQRLLDAGGKIRHGTRLA 259 

FGQRLLDAGGKIRHRTRLADW 261 



An ORF1 32ng nucleotide sequence <SEQ ID 871 > was predicted to encode a protein having amino 
acid sequence <SEQ ID 872>: 

1 MKHIHIIGIG GTFMGGIAAI A KEAGFKVSG CDAKMYPPMS TQLEALGIGV 

51 HEGFDAAQLE EFQADIYVIG NVARRGMDW EAILNRGLPY ISGPQWLAEN 

101 VLHHHWVLGV AGTHGKTTTA SMLAWVLEYA GLAPGFLIGG VPGKFRRFRP 

151 PTANAASRPE QQIAVFRHRS RRIRHRLFRQ TLQIRALSPA YRRVEQSGIR 

201 PRRHLRRLGR DTDPVPPPRA HRTIRRPHRL QRTAAKPARY FGQRLLDAGG 

251 KIRHRTRLAD W* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 873>: 



951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 



ATGAAACACA 
TGCCGCCATT 
AGATG7ATCC 
CACGAAGGCT 
CGTCATCGGC 
TGAACCGTGG 
GTGCtgcacc 
gaccaCcGcg 
CGGGCTTCCT 
CTACCGCAAA 
CATCGAAGCC 
TCGTGCATTA 
CACGCCGACA 
CCTCGTGCGC 
AGCAAAGCCT 
AAATTCGGCA 
CTCGTTCGAC 
ATTTGATGGG 
GCACGCCATG 
GTTTAAAAAC 
TCACCGTTTA 
ATTCAAGGTT 
CGAGCCGCGT 
CCGCAAGCCT 
GACTGGGACG 
CGGTAAAGAT 
CCGGCGACCA 
ACCAAACTGC 



TCCACATTAT 
GCCAAAGAAG 
GCCGATGAGC 
TCGATGCCGC 
AATGTCGCCA 
GCTGCCTTAT 
atcaTTGGgt 
tCCATGCTCG 
CATCGGCGGt 
CGCCGCGTCA 
GACGAATACG 
TCGCCCGCGT 
TCTTCGCCGA 
ACCGTACCAT 
GCAAGATACT 
CCGGACACGG 
GTATTGCTTG 
CGGACACAAC 
CCGGAGTCGA 
GTCAAACGCC 
CGACGATTTC 
TGCGCCAACG 
TCCAACACCA 
CAAAGAAGCC 
TTGCCGAAGC 
TTCGATACCT 
TATTTTGGTG 
TGGACGCTTT 



CGGTATCGGC 
CCGGGTTCAA 
ACCCAGCTCG 
GCAGTTGGAA 
GGCGCGGGAT 
ATTTCCGGCC 
ACTCGGCGTG 
CCTGGGTCTT 
g-accggaAA 
AGACCCGAAC 
ACACCGCCTT 
ACCGCCGTGT 
CTTGGGCGCG 
CCGAAGGCCT 
TTGGACAAAG 
CTGGCAGATT 
ACGGCAAAAA 
CGCATGAACG 
TGTTCAGACG 
GCATGGAAAT 
GCCCACCACC 
TGTCGGCGGC 
TGAAACTCGG 
GACCAAGTGT 
CCTCGCGCCT 
TCGTTGCCGA 
ATGAGCAACG 
GAGATAG 



GGCACGTTTA 
AGTCAGCGGT 
AAGCCTTGGG 
GAATTTCAAG 
GGATGTGGTC 
CGCAATGGCT 
GcagggaCGC 

GGAATATGCC 
ATTTCGGCGT 
AGCAAATCGC 
TTTCGACAAA 
TGAACAATCT 
ATACAGACCC 
CATCGTCTGC 
GCTGCTGGAC 
GGTGAAGTCA 
AGCCGGACAC 
CGCTCGCCGT 
GCCTGCGAAG 
CAAAGGCACG 
CGACCGCCAT 
GCGCGCATCC 
CACGATGAAG 
TCTGCTACGC 
TTGGGCTGCA 
AATTGTGAAA 
GCGGTTTCGG 



TGGGCGGGAT 
TGCGACGCGA 
CATAGGCGTA 
CCGATATTTA 
GAGGCGATTT 
GGCTGAAAac 
ACGGcaaAac 

GGACTCGCGC 
TTCCGCCCGC 
CGTTTTTCGT 
CGCTCCAAAT 
GGAATTCGAC 
AGTTCCACCA 
AACGGACAGC 
GCCGGTGGAA 
ATGCCGACGG 
GTCGCATGGG 
CATCGCTGCC 
CCTTGGGTGC 
GCAAACGGCA 
CGAAACCACG 
TCGCCGTCCT 
TCCGCCCTGC 
CGGCGGCGCG 
GGCTGCGCGT 
AACGCCCGAA 
CGGAATACAC 



This corresponds to the amino acid sequence <SEQ ID 874; ORF132ng-l>: 



MKHIHIIGIG GTFMGGIAAI 



51 HEGFDAAQLE 

101 VLHHHWVLGV 

151 LPQTPRQDPN 

201 HADIFADLGA 

251 KFGTGHGWQI 

301 ARHAGVDVQT 

351 IQGLRQRVGG 

4 01 DWDVAEALAP 

4 51 TKLLDALR* 



EFQADIYVIG 
AGTHGKTTTA 
SKSPFFVIEA 
IQTQFHHLVR 
GEVNADGSFD 
ACEALGAFKN 
ARILAVLEPR 
LGCRLRVGKD 



AKEAGFKVSG 
NVARRGMDW 
SMLAWVLEYA 
DEYDTAFFDK 
TVPSEGLIVC 
VLLDGKKAGH 
VKRRMEIKGT 
SNTMKLGTMK 
FDTFVAEIVK 



CDAKMYPPMS 
EAILNRGLPY 
GLAPGFLIGG 
RSKFVHYRPR 
NGQQQSLQDT 
VAWDLMGGHN 
ANGITVYDDF 
SALPASLKEA 
NARTGDHILV 



TQLEALGIGV 
ISGPQWLAEN 
VPENFGVSAR 
TAVLNNLEFD 
LDKGCWTPVE 
RMNALAVIAA 
AHHPTAIETT 
DQVFCYAGGA 
MSNGGFGGIH 
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ORF132ng-l and ORF132-1 show 93.2% identity in 458 aa overlap: 

orfl32ng-l.pep MKHIHIIGIGGTFMGGIAAIAKEAGFKVSGCDAKMYPPMSTQLEALGIGVHEGFDAAQLE 

orf 132-1 MKHIHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDflAQLD 

orfl32ng-l.pep EFQADIYVIGNVARRGMDWEAILNRGLPYISGPQWLAENVLHHHWVLGVAGTHGKTTTA 

orfl32-l EFKADVYVIGNVAKRGMDWEAILNLGLPYI SGPQWLSENVLHHHWVLGVAGTHGKTTTA 

orf 132ng-l . pep SMLAWVLEYAGLAPGFLIGGVPENFGVSARLPQTPRQDPNSKSPFFVIEADEYDTAFFDK 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I 

orf 132-1 SMLAWVLEYAGLAPGFLIGGVPEMFGVSARLPQTPRQDPNSQSPFFVIEADEYDTAFFDK 

orfl32ng-l.pep RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHHLVRTVPSEGLIVCNGQQQSLQDT 

I I I I I I I I I Mil 1111:11 : I I I I I I I 

orf 132-1 RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHYLVRTVPSEGLIVCNGRQQSLQDT 

orfl32ng-l.pep LDKGCWTPVEKFGTGHGWQIGEVNADGSFDVLLDGKKAGHVAWDLMGGHNRMNALAVIAA 

orfl32-l LDKGCWT PVEKFGTEHGWQAGEANADGSFDVLLDGKTAGRVKWDLMGRHNRMNALAVIAA 

orfl32ng-l.pep ARHAGVDVQTACEALGAFKNVKRRMEIKGTANGITVYDDFAHHPTAIETTIQGLRQRVGG 

111:111:1111 I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 132-1 ARHVGVDIQTACEALGAFKNVKRRMEIKGTANGITVYDDFAHHPTAIETTIQGLRQRVGG 

orfl32ng-l.pep ARILAVLEPRSNTMKLGTMKSALPASLKEADQVFCYAGGADWDVAEALAPLGCRLRVGKD 

I I I I I I H I I I I I I I I 11:11111111111111: II II I I I I 

orf 132-1 ARILAVLEPRSNTMKLGTMKSALPVSLKEADQVFCYAGGVDWDVAEALAPLGGRLNVGKD 

orfl32ng-l.pep FDTFVAEIVKNARTGDHILVMSNGGFGGIHTKLLDALRX 

orf 132-1 FDAFVAEIVKNASVGDHILVMSNGGFGGIHGKLLEALRX 

In addition, ORF132ng-l is homologous to a hypothetical E.coli protein: 

pir||S56459 hypothetical protein o457 - Escherichia coli >gi|537075 (U14003) 
ORF_o457 [Escherichia coli] >gi 1 1790680 (AE000494) hypothetical 48.5 kD prote- 
in fbp-pmba intergenic region [Escherichia coli] Length = 457 
Score = 474 bits (1207), Expect = e-133 

i = 249/439 (56%), Positives - 294/439 (661), Gaps - 13/439 (2%) 

KEAGFKVSGCDAKMYPPMSTQLEALGIGVHEGFDAAQLEEFQADIYVIGNVARRGMDWE 81 
++ G 4-V+G DA +YPPMST LE GI + +G+DA+QLE Q D+ +IGN RG VE 
RQLGHEVTGSDANVYPPMSTLLEKQGIELIQGYDASQLEP-QPDLVIIGNAMTRGNPCVE 79 



Identities 




22 


Sbjct: 


21 


Query: 


82 


Sbjct: 


80 




142 


Sbjct: 


140 


Query: 


202 


Sbjct: 


191 




262 


Sbjct: 


251 


Query: 


321 


Sbjct: 


311 


Query: 


380 


Sbjct: 


371 




439 



ADIF DL AIQ QFHHLVR VP +G 1+ 



D S ++VLLDG+K G V W L+G HN 



+L+ T+ GCW+ E G 



A ALG+F N 



+RR+E++G ANG+TVYDDFAHHPTAI T+ LR +VGG ARI +AVLE PRSNTMK+G 



+VK A+ GDHI 



439 LVMSNGGFGG I HTKLLDAL 457 
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Sbjd 

Based on this analysis, it was predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF132-1 (26.4kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
20A shows the results of affinity purification of the His-fusion protein, and Figure 20B shows the 
results of expression of the GST-fusion in E.coli. Purified His-fusion protein was used to immunise 
mice, whose sera were used for FACS analysis (Figure 20C) and ELISA (positive result). These 
experiments confirm that ORF132 is a surface-exposed protein, and that it is a useful immunogen. 

Example 103 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 875> 

1 . . CCGGGCTATT ACGGCTCGGA TGACGAATTT AAGCGGGCAT TCGGAGAAAA 

51 CTCGCCGACA TmCAAGAAAC ATTGCAACCG GAGCTGCGGG ATTTATGAAC 

101 CCGTATTGAA AAAATACGGC AAAAAGCGCG CCAACAACCA TTCGGTCAGC 

151 ATTAGTGCGG ACTTCGGCGA TTATTTCATG CCGTTCGCCA GCTATTCGCG 

201 CACACACCGT ATGCCCAACA TCCAAGAAAT GTATTTTTCC CAAATCGGCG 

251 ACTCCGGCGT TCACACCGCC TTAAAACCAG AGCGCGCAAA CACTTGGCAA 

301 TTTGGCTTCr ATACCTATAA AAAAGGATTG TTAAAACAAG ATGATACATT 

351 AGGATTAAAA CTGGTCGGCT ACCGCAGCCG CATCGACAAC TACATCCACA 

401 ACGTTTACGG GAAATGGTGG GATTTGAACG GGGATATTCC GAGCTGGGTC 

451 AGCAGCACCG GGCTTGCCTA CACCATCCAA CATCGCrATT TCAwAGACAA 

501 AGTGCATCAA nnnnnnnnnn nnnnnnnnnn nnnnTACGAT TATGGGCGTT 

551 TTTTCACCAA CCTTTCTTAC GCCTAT CAAA AAAGCACGCA ACCGACCAAC 

601 TTCAGCGATG CGAGCGAATC GCCCAACAAT GCGTCCAAAG AAGACCAACT 

651 CAAACAAGGT TATGGGTTGA GCAGGGTTTC CGCCCTGCCG CGAGATTACG 

701 GACGTTTGGA AGTCGGTACG CGCTGGTTGG GCAACAAACT GACTTTGGGC 

751 GGCGCGATGC GCTATTTCGG CAAGAGCATC CGCGCGACGG CTGAAGAACG 

801 CTATATCGAC GGCACCAACG GGGGAAATAC CAGCAATTTC CGGCAACTGG 

851 GCAAGCGTTC CATCAAACAA ACCGAAACTC TTGCCCGCCA GCCTTTGATT 

901 TTwGATTTTa ACGCCGCTTA CGAGCCGAAG AAAAACCTTA TTTTCCGCGC 

951 CGAAGTCAAA AATCTGTTCG ACAGGCGTTA TATCGATCCG CTCGATGCGG 

1001 GCAATGATGC GGCAAC . GAG CGTTATTACA GCTCGTTCGA CCCGAAAGAC 

1051 AAGGACrrAG ACGTAACGTG TAATGCTGAT AAAACGTTGT GCaACGGCAA 

1101 ATACGGCGGC ACAAGCAAAA GCGTATTGAC CAATTTTGCA CGCGGACGCA 

1151 CCTTTTTgAT GACGATGAGC TACAAGTTTT AA 

This corresponds to the amino acid sequence <SEQ ID 876; ORF133>: 

1 . . PGYYGSDDEF KRAFGENSPT XKKHCNRSCG IYEPVLKKYG KKRANNHSVS 

51 ISADFGDYFM PFASYSRTHR MPNIQEMYFS QIGDSGVHTA LKPERANTWQ 

101 FGFXTYKKGL LKQDDTLGLK LVGYRSRIDK YIHNVYGKWW DLNGDIPSWV 

151 SSTGLAYTIQ HRXFXDKVHQ XXXXXXXXYD YGRFFTNLSY AYQKSTQPTN 

201 FSDASESPNN ASKEDQLKQG YGLSRVSALP RDYGRLEVGT RWLGNKLTLG 

2 51 GAMRYFGKSI RATAEERYID GTNGGNTSNF RQLGKRSIKQ TETLARQPLI 

301 XDFNAAYEPK KNL I FRAEVK NLFDRRYIDP LDAGNDAAXE RYYSSFDPKD 

351 KDXDVTCNAD KTLCNGKYGG TSKSVLTNFA RGRTFLMTMS YKF* 

Further work revealed the further partial DNA sequence <SEQ ID 877>: 

1 GAGGCGCAGA TACAGGTTTT GGAAGATGTG CACGTCAAGG CGAAGCGCGT 

51 ACCGAAAGAC AAAAAAGTGT TTACCGATGC GCGTGCCGTA TCGACCCGTC 

101 AGGATATATT CAAATCCAGC GAAAACCTCG ACAACATCGT ACGCAGCATC 

151 CCCGGTGCGT TTACACAGCA AGATAAAAGC TCGGGCATTG TGTCTTTGAA 

201 TATTCGCGGC GACAGCGGGT TCGGGCGGGT CAATACGATG GTGGACGGCA 

251 TCACGCAGAC CTTTTATTCG ACTTCTACCG ATGCGGGCAG GGCAGGCGGT 
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301 TCATCTCAAT TCGGTGCATC TGTCGACAGC AATTTTATTG CCGGACTGGA 

351 TGTCGTCAAA GGCAGCTTCA GCGGCTCGGC AGGCATCAAC AGCCTTGCCG 

401 GTTCGGCGAA TCTGCGGACT TTAGGCGTGG ATGACGTCGT TCAGGGCAAT 

451 AATACCTACG GCCTGCTGCT AAAAGGTCTG ACCGGCACCA ATTCAACCAA 

501 AGGTAATGCG ATGGCGGCGA TAGGTGCGCG CAAATGGCTG GAAAGCGGAG 

551 CATCTGTCGG TGTGCTTTAC GGGCACAGCA GGCGCAGCGT GGCGCAAAAT 

601 TACCGCGTGG GCGGCGGCGG GCAGCACATC GGAAATTTTG GCGCGGAATA 

651 TTTGGAACGG CGCAAGCAGC GATATTTTGT ACAAGAGGGT GCTTTGAAAT 

701 TCAATTCCGA CAGCGGAAAA TGGGAGCGGG ATTTACAAAG GCAACAGTGG 

751 AAATACAAGC CGTATAAAAA TTACAACAAC CAAGAACTAC AaAAATACAT 

801 CGAAGAGCAT GACAAAAG C T GGCGGGAAAA CCTg . CaCCG CAATACGACA 

851 TTACCCCCAT CGATCCGTCC AGCCTGAAGC AGCAGTCGGC AGGCAATCTG 

901 TTTAAATTGG AATACGACGG CGTATTCAAT AAATACACGG CGCAATTTCG 

951 CGATTTAAAC ACCAAAATCG GCAGCCGCAA AATCATCAAC CGCAATTATC 

1001 AGTTCAATTA CGGTTTGTCT TTGAACCCGT ATACCAACCT CAATCTGACC 

1051 GCAGCCTACA ATTCGGGCAG GCAGAAATAT CCGAAAGGGT CGAAGTTTAC 

1101 AGGCTGGGGG CTTTTAAAGG ATTTTGAAAC CTACAACAAC GCGAAAATCC 

1151 TCGACCTCAA CAACACCGCC ACCTTCCGGC TGCCCCGCGA AACCGAGTTG 

1201 CAAACCACTT TGGGCTTCAA TTATTTCCAC AACGAATACG GCAAAAACCG 

1251 CTTTCCTGAA GAATTGGGGC TGTTTTTCGA CGGTCCTGAT CAGGACAACG 

1301 GGCTTTATTC CTATTTGGGG CGGTTTAAGG GCGATAAAGG GCTGCTGCCC 

1351 CAAAAATCAA CCATTGTCCA ACCGGCCGGC AGCCAATATT TCAACACGTT 

1401 CTACTTCGAT GCCGCGCTCA AAAAAGACAT TTACCGCTTA AACTACAGCA 

1451 CCAATACCGT CGGCTACCGT TTCGGCGGCG AATATACGGG CTATTACGGC 

1501 TCGGATGACG AATTTAAGCG GGCATTCGGA GAAAACTCGC CGACATACAA 

1551 GAAACATTGC AACCGGAGCT GCGGGATTTA TGAACCCGTA TTGAAAAAAT 

1601 ACGGCAAAAA GCGCGCCAAC AACCATTCGG TCAGCATTAG TGCGGACTTC 

1651 GGCGATTATT TCATGCCGTT CGCCAGCTAT TCGCGCACAC ACCGTATGCC 

1701 CAACATCCAA GAAATGTATT TTTCCCAAAT CGGCGACTCC GGCGTTCACA 

1751 CCGCCTTAAA ACCAGAGCGC GCAAACACTT GGCAATTTGG CTTCAATACC 

1801 TATAAAAAAG GATTGTTAAA ACAAGATGAT ACATTAGGAT TAAAACTGGT 

1851 CGGCTACCGC AGCCGCATCG ACAACTACAT CCACAACGTT TACGGGAAAT 

1901 GGTGGGATTT GAACGGGGAT ATTCCGAGCT GGGTCAGCAG CACCGGGCTT 

1951 GCCTACACCA TCCAACATCG CAATTTCAAA GACAAAGTGC ACAAACACGG 

2001 TTTTGAGTTG GAGCTGAATT ACGATTATGG GCGTTTTTTC ACCAACCTTT 

2051 CTTACGCCTA TCAAAAAAGC ACGCAACCGA CCAACTTCAG CGATGCGAGC 

2101 GAATCGCCCA ACAATGCGTC CAAAGAAGAC CAACTCAAAC AAGGTTATGG 

2151 GTTGAGCAGG GTTTCCGCCC TGCCGCGAGA TTACGGACGT TTGGAAGTCG 

2201 GTACGCGCTG GTTGGGCAAC AAACTGACTT TGGGCGGCGC GATGCGCTAT 

2251 TTCGGCAAGA GCATCCGCGC GACGGCTGAA GAACGCTATA TCGACGGCAC 

2301 CAACGGGGGA AATACCAGCA ATTTCCGGCA ACTGGGCAAG CGTTCCATCA 

2351 AACAAACCGA AACTCTTGCC CGCCAGCCTT TGATTTTTGA TTTTTACGCC 

24 01 GCTTACGAGC CGAAGAAAAA CCTTATTTTC CGCGCCGAAG TCAAAAATCT 

2451 GTTCGACAGG CGTTATATCG ATCCGCTCGA TGCGGGCAAT GATGCGGCAA 

2501 CGCAGCGTTA TTACAGCTCG TTCGACCCGA AAGACAAGGA CGAAGACGTA 

2551 ACGTGTAATG CTGATAAAAC GTTGTGCAAC GGCAAATACG GCGGCACAAG 

2 601 CAAAAGCGTA TTGACCAATT TTGCACGCGG ACGCACCTTT TTGATGACGA 

2 651 TGAGCTACAA GTTTTAA 

This corresponds to the amino acid sequence <SEQ ID 878; ORP133-l>: 

1 EAQIQVLEDV HVKAKRV PKD KKVFTDARAV STRQDIFKSS ENLDNIVRSI 

51 PGAFTQQDKS SGIVSLNIRG DSGFGRVNTM VDGITQTFYS TSTDAGRAGG 

101 SSQFGASVDS NFIAGLDWK GSFSGSAGIN SLAGSANLRT LGVDDWQGN 

151 NTYGLLLKGL TGTNSTKGNA MAAIGARKWL ESGASVGVLY GHSRRSVAQN 

201 YRVGGGGQHI GNFGAEYLER RKQRYFVQEG ALKFNSDSGK WERDLQRQQW 

251 KYKPYKNYNN QELQKYIEEH DKSWRENLXP QYDITPIDPS SLKQQSAGNL 

301 FKLEYDGVFN KYTAQFRDLN TKIGSRKIIN RNYQFNYGLS LNPYTNLNLT 

351 AAYNSGRQKY PKGSKFTGWG LLKDFETYNN AKILDLNNTA TFRLPRETEL 

4 01 QTTLGFNYFH NEYGKNRFPE ELGLFFDGPD QDNGLYSYLG RFKGDKGLLP 

4 51 QKSTIVQPAG SQYFNTFYFD AALKKDIYRL NYSTNTVGYR FGGEYTG1TG 

501 SDDEFKRAFG ENSPTYKKHC NRSCGIYEPV LKKYGKKRAN NHSVSISADF 

551 GDYFMPFASY SRTHRMPNIQ EMYFSQIGDS GVHTALKPER ANTWQFGFNT 

601 YKKGLLKQDD TLGLKLVGYR SRIDNYIHNV YGKWWDLNGD IPSWVSSTGL 

651 AYTIQHRNFK DKVHKHGFEL ELNYDYGRFF TNLSYAYQKS TQPTNFSDAS 

7 01 ESPNN ASKED QLKQGYGLSR VSALPRDYGR LEVGTRWLGN KLTLGGAMRY 

7 51 FGKS IRATAE ERYIDGTNGG NTSNFRQ^GK RSIKQTETLA RQPLIFDFYA 

801 AYEPKKNLIF RAEVKNLFDR RYIDPLDAGN DAATQRYYSS FDPKDKDEDV 

851 TCNADKTLCN GKYGGTSKSV LTNFARGRTF LMTMSYKF* 

Computer analysis of this amino acid sequence gave the following results: 
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Homology with with the probable TonB-dependent receptor H3121 of H. influenzae (accession number U32801) 
ORF133 and HI121 show 57% aa identity in 363aa overlap: 

Orfl33: 31 IYEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTA 90 

I EP+L K G K+A NHS ++SA+ DYFMPF +YSRTHRMPN I QEM+FSQ+ ++GV+TA 
HI121: 563 INEPILHKSGHKKAFNHSATLSAELSDYFMPFFTYSRTHRMPNIQEMFFSQVSNAGVNTA 622 

Orfl33: 91 LKPERANTWQFGFXTYKKGLLKQDDTLGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWV 150 

LKPE+++T+Q GF TYKKGL QDD LG+KLVGYRS I NYIHNVYG WW +P+W 
HI121: 623 LKPEQSDTYQLGFNTYKKGLFTQDDVLGVKLVGYRSFIKNYIHNVYGVWW— RDGMPTWA 680 

Orfl33: 151 SSTGLAYTIQHRXFXDKVHXXXXXXXXXYDYGRFFTNLSYAYQKSTQPTNFSDASESPNN 210 

S G YTI H+ + V YD GRFF N+SYAYQ++ QPTN++DAS PNN 

HI121: 681 ESNGFKYTIAHQNYKPIVKKSGVELEINYDMGRFFANVSYAYQRTNQPTNYADASPRPNN 740 

Orfl33: 211 ASKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYID 270 

AS+ED LKQGYGLSRVS LP+DYGRLE+GTRW KLTLG A RY+GKS RAT EE YI+ 
HI121: 741 ASQEDILKQGYGLSRVSMLPKDYGRLELGTRWFDQKLTLGLAARYYGKSKRAT IEEE YIN 800 

Orfl33: 271 GTNGGNTSNFRQLGKRSIKQTETLARQPLIXDFNAAYEPKKNLIFRAEVKNLFDRRYIDP 330 

G+ + R+ ++K+TE + +QP+I D + +YEP K+LI +AEV+NL D+RY+DP 

HI121: 801 GSR-FKKNTLRRENYYAVKKTEDIKKQPI ILDLHVSYEPIKDLIIKAEVQNLLDKRYVDP 859 

Orfl33: 331 LDAGNDAAXERYYSSFDPKDKDXDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMS 390 

LDAGNDAA +RYYSS + + C D + C GG+ K+VL NFARGRT++++++ 
HI121: 860 LDAGNDAASQRYYSSL NNSIECAQDSSAC GGSDKTVLYNFARGRTYILSLN 910 

Orfl33: 391 YKF 393 
YKF 

HI121: 911 YKF 913 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF133 shows 90.8% identity over a 392aa overlap with an ORF (ORF133a) from strain A of N. 
meningitidis: 

10 20 30 

orf 133 .pep PGYYGSDDEFKRAFGENSPTXKKHCNRSCGI 

Ill : I I I I 

orf 133a FYFDAALKKDIYRLNYSTNTVGYRFGGXYTGYYXSDDEFKRAFGENSPTYXKHCNQSCGI 
450 460 470 480 490 500 

40 50 60 70 80 90 

orf 133. pep YEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTAL 



o r f 1 3 3 . pep KPERANTWQFGFXT YKKGLLKQDDT LGLKLVGYRSRIDNYIHNVYGKWWDLNGDI PSWVS 



orf 133. pep STGLAYTIQHRXFXDKVHQXXXXXXXXYDYGRFFTNLSYAYQKSTQPTNFSDASESPNNA 



65 



orf 133. pep 



280 290 300 310 320 330 

TNGGNTSNFRQLGKRSIKQTETLARQPLIXDFNAAYEPKKNLIFRAEVKNLFDRRYIDPL 
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II! I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 

or f 133a TNGXXTSNFRQLGKRSIXQTETLARQPLIFDXYAAYEPKKXLIFRAEVKNLFDRRYIDPL 
750 760 770 780 790 800 

340 350 360 370 380 390 

orf 133 . pep DAGNDAAXERYYSSFDPKDKDXDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSY 

or f 133a DAGNDAATQRYYSSFDPKDKDEEVTCNDDNTLCNGKYGGTSKSVLTNFARGXTFLITMSY 
810 820 830 840 850 860 



orf 133. pep KFX 

orfl33a KFX 
870 

A partial ORF 133a nucleotide sequence <SEQ ID 879> is: 

1 AAAGACAAAA AAGTGTTTAC CGATGCGCGT GCCGTATCGA CCCGTCAGGA 

51 TATATTCAAA TCCANCGAAA ACCTCGACAA CATCGTACGC ANCATCCCCG 

101 GTGCGTTTAC AC AN C AANAT AAAAGCTCGG GCNTTGTGTC TTTGAATATT 

151 CGCNGCGACA GCGGGTTCGG GCGGGTCAAT ACNATGGTNG ACGGCATCAC 

201 NCANACCTTT TATTCGACTT CTACCGATGC GGGCAGGGCA GGCGGTTCAT 

251 CTCAATTCGG TGCATCTGTC GACAGCAATT TTATNGCCGG ACTGGATGTC 

301 GTCAAAGGCA GCTTCAGCGG CTCGGCAGGC ATCAACAGCC TTGCCGGTTC 

351 GGCGAATCTG CGGACTTTAN GCGTGGATGA TGTCGTTCAG GGCAATANTA 

4 01 CNTACGGCCT GCTGCTAAAA GGTCTGACCG GCACCAATTC AACCAAAGGT 

4 51 AATGCGATGG CGGCGATAGG TGCGCGCAAA TGGCTGGAAA GCGGAGCATC 

501 TGTCGGTGTG CTTTACGGGC ACAGCAGGCG CAGCGTGGCG CAAAATTACC 

551 GCGTGGGCGG CGGCGGGCAG CACATCGGAA ATTTTGGCGC GGAATATCTG 

601 GAACGACGCA AGCAACGATA TTTTGAGCAA GAAGGCGGGT TGAAATTCAA 

651 TTCCAACAGC GGAAAATGGG AGCGGGATTT CCAAAAGTCG TACTGGAAAA 

701 CCAAGTGGTA TCAAAAATAC GATGCCCCCC AAGAACTGCA AAAATACATC 

7 51 GAAGGTCATG ATAAAAGCTG GCGGGAAAAC CTGGCGCCGC AATACGACAT 

801 CACCCCCATC GATCCGTCCA GCCTGAAGCN GCAGTCGGCA GGCAACCTGT 

851 TTAAATTGGA ATACGACGGC GTATTCAATA AATACACGGC GCAATTTCGC 

901 GATTTAAACA CCAAAATCGG CAGCCGCAAA AT CAT CAAC C GCAATTATCA 

951 ATTCAATTAC GGTTTGTCTT TGAACCCGTA TACCAACCTC AATCTGACCG 

1001 CAGCCTACAA TTCGGGCAGG CAGAAATATC CGAAAGGGTC GAAGTTTACA 

1051 GGCTGGGGGC TTTTNAAAGA TTTTGAAACC TACAACAACG CAAAAATCCT 

1101 CGACCTCANC AACACCTCCA CCTTCCGGCT GCCCCGTGAA ACCGAGTTGC 

1151 AAACCACTTT GGGCTTCAAT TATTTCCACA ACGAATACGG CAAAAACCGC 

1201 TTTCCTGAAG AATTGGGGCT GTTTTTCGAC GGTCCGGATC ANGACAACGG 

1251 GCTTTATTCC TATTTGGGGC GGTTTAAGGG CGATAAAGGG CTGCTGCCCC 

1301 AAAAATCAAC CATTGTCCAA CCGGCCGGCA GCCAATATTT CAACACGTTC 

1351 TACTTCGATG CCGCGCTCAA AAAAGAC AT T TACCGCTTAA ACTACAGCAC 

1401 CAATACCGTC GGCTACCGTT TCGGCGGCNA ATATACGGGC TATTACNGCT 

1451 CGGATGACGA ATTTAAGCGG GCATTCGGAG AAAACTCGCC GACATACANG 

1501 AAACATTGCA ACCAGAGCTG CGGAATTTAT GAACCCGTAT TGAAAAAATA 

1551 CGGCAAAAAG CGCGCCAACA ACCATTCGGT CAGCATTAGT GCGGACTTCG 

1601 GCGATTATTT CATGCCGTTC GCCAGCTATT CGCGCACACA CCGTATGCCC 

1651 AACATCCAAG AAATGTATTT TTCCCAAATC GGCGACTCCG GCGTTCACAC 

1701 CGCCTTAAAA CCAGAGCGCG CAAACACTTG GCAATTTGGC TTCAATACCT 

1751 ATAAAAAAGG ATTGTTAAAA CAAGATGATA TATTAGGATT AAAACTGGTC 

1801 GGCTACCGCA GCCGCATCGA CNACTACATC CACAACGTTT ACGGGAAATG 

1851 GTGGGATTTG AACGGGAATA TTCCGAGCTG GGTCAGCAGC ACCGGGCTTG 

1901 CCTACACCAT CCAACACCGC AATTTCAAAG ACAAAGTGCA CAAACACGGT 

1951 TTTGAGTTGG AGCTGAATTA CGATTATNGG CGTTTTTTCA CCAACCTTTC 

2 001 TTACGCCTAT CAAAAAAGCA CGCAACCGAC CAACTTCAGC GATGCGAGCG 

2051 AATCGCCCAA CAATGCGTCC AAAGAAGACC AACTCAAACA AGGTTATGGG 

2101 TTGAGCAGGG TTTCCGCCCT GCCGCGAGAT TACGGACGTT TGGAAGTCGG 

2151 TACGCGCTGG TTGGGCAACA AACTGACTTT GGGCGGCGCG ATGCGCTATT 

2201 TCGGCAAGAG CATCCGCGCG ACGGCTGAA3 AACGCTATAT CGACGNCACC 

2251 AATGGGGNAN NTACCAGCAA TTTCCGGCAA CTGGGCAAGC GTTCCATCAN 

2301 ACAAACCGAA ACCCTTGCCC GCCAGCCTTT GATTTTTGAT TTNTACGCCG 

2351 CTTACGAGCC GAAGAAAAAN CTTATTTTCC GCGCCGAAGT CAAAAATCTG 

2401 TTCGACAGGC GTTATATCGA TCCGCTCGAT GCGGGCAATG ATGCGGCAAC 

2451 GCAGCGTTAT TACAGTTCGT TCGACCCGAA AGACAAGGAC GAAGAAGTAA 

2501 CGTGTAATGA TGATAACACG TTATGCAACG GCAAATACGG CGGCACAAGC 

2551 AAAAGCGTAT TGACCAATTT TGCACGCGGA CNCACCTTTT TGATAACGAT 

2601 GAGCTACAAG TTTTAA 
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This encodes a protein having (partial) amino acid sequence <SEQ ID 8 



1 KDKKVFTDAR 

51 RXDSGFGRVN 

101 VKGSFSGSAG 

151 NAMAAIGARK 

201 ERRKQRYFEQ 

251 EGHDKSWREN 

301 DLNTKIGSRK 

351 GWGLXKDFET 

4 01 FPEELGLFFD 

4 51 YFDAALKKDI 

501 KHCNQSCGIY 

551 NIQEMYFSQI 

601 GYRSRIDXYI 

651 FELELNYDYX 

701 LSRVSALPRD 

7 51 NGXXTSNFRQ 

801 FDRRYIDPLD 

851 KSVLTNFARG 



AVSTRQDIFK 
TMVDGITXTF 
INSLAGSANL 
WLESGASVGV 
EGGLKFNSNS 
LAPQYDITPI 
IINRNYQFNY 
YNNAKILDLX 
GPDXDNGLYS 
YRLNYSTNTV 
EPVLKKYGKK 
GDSGVHTALK 
HNVYGKWWDL 
RFFTNLSYAY 
YGRLEVGTRW 
LGKRSIXQTE 
AGNDAATQRY 
XTFLITMSYK 



SXENLDNIVR 
YSTSTDAGRA 
RTLXVDDWQ 
LYGHSRRSVA 
GKWERDFQKS 
DPSSLKXQSA 
GLSLNPYTNL 
NTSTFRLPRE 
YLGRFKGDKG 
GYRFGGXYTG 
RANNHSVSIS 
PERANTWQFG 
NGNIPSWVSS 
QKSTQPTNFS 
LGNKLTLGGA 
TLARQPLIFD 
YSSFDPKDKD 



XIPGAFTXQX 
GGSSQFGASV 
GNXTYGLLLK 
QNYRVGGGGQ 
YWKTKWYQKY 
GNLFKLEYDG 
NLTAAYNSGR 
TELQTTLGFN 
LLPQKSTIVQ 
YYXSDDEFKR 
ADFGDYFMPF 
FNTYKKGLLK 
TGLAYTIQHR 
DASESPNNAS 
MRYFGKSIRA 
XYAAYEPKKX 
EEVTCNDDNT 



KSSGXVSLNI 
DSNFXAGLDV 
GLTGTNSTKG 
HIGNFGAEYL 
DAPQELQKYI 
VFNKYTAQFR 
QKYPKGSKFT 
YFHNEYGKNR 
PAGSQYFNTF 
AFGENSPTYX 
ASYSRTHRMP 
QDDILGLKLV 
NFKDKVHKHG 
KEDQLKQGYG 
TAEERYIDXT 
LIFRAEVKNL 
LCNGKYGGTS 



ORF133a and ORF133-1 show 94.3% identity in 871 aa overlap: 



10 20 30 40 

orf 133a. pep KDKKVFTDARAVSTRQDIFKSXENLDNIVRXIPGAFTXQXKS 

I I I I I I I I I I I 1 I I I I I I Mill! I I I I I I I I I 

orf 133-1 EAQIQVLEDVHVKAKRVPKDKKVFTDARAVSTRQDIFKSSENLDNIVRSIPGAFTQQDKS 
10 20 30 40 50 60 

50 60 70 80 90 100 

orf 133a. pep SGXVSLNIRXDSGFGRVNTMVEGITXTFYSTSTDAGRAGGSSQFGASVDSNFXAGLDWK 

Ill I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 133-1 SGIVSLNIRGDSGFGRVNTMVDGITQTFYSTSTDAGRAGGSSQFGASVDSNFTAGLDVVK 
70 80 90 100 110 120 



110 120 130 140 150 160 

orf 133a. pep gsfsgsaginslagsanlrtlxvddwqgnxtyglllkgltgtnstkgnamaaigarkwl 

orf 133-1 GSFSGSAGINSLAGSAlvILRTLGVDDWQGNNTYGLLLKGLTGTNSTKGNAMAAIGARKWL 
130 140 150 160 170 180 



170 180 190 200 210 220 

orf 133a . pep ESGASVGVLYGHSRRSVAQNYRVGGGGQHIGNFGAEYLERRKQRYFEQEGGLKFNSNSGK 

orf 133-1 ESGASVGVLYGHSRRSVAQNYRVGGGGQHIGNFGAEYLERRKQRYFVQEGALKFNSDSGK 
190 200 210 220 230 240 



230 240 250 260 270 280 

orf 133a. pep WERDFQKSYWKTKWYQKYDAPQELQKYIEGHDKSWRENLAPQYDITPIDPSSLKXQSAGN 

orf 133-1 WERDLQRQQWKYKPYKNYNN-QELQKYIEEHDKSWRENLXPQYDITPIDPSSLKQQSAGN 
250 260 270 280 290 



290 300 310 320 330 340 

orf 133a . pep LFKLEYDGVFNKYTAQFRDLNTKIGSRKIINRNYQFNYGLSLNPYTNLNLTAAYNSGRQK 
I I I I I I I I I I I I I I I i I II I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 133-1 LFKLEYDGVFNKYTAQFRDLNTKIGSRKIINRNYQFNYGLSLNPYTNLNLTAAYNSGRQK 
300 310 320 330 340 350 



350 360 370 380 390 400 

orf 133a . pep YPKGSKFTGWGLXKDFETYNNAKILDLXNTSTFRLPRETELQTTLGFNYFHNEYGKNRFP 

orf 133-1 YPKGSKFTGWGLLKDFETYNNAKILDLNNTATFRLPRETELQTTLGFNYFHNEYGKNRFP 
360 370 380 390 400 410 



410 420 430 440 450 460 

orf 133a. pep EELGLFFDGPDXDNGLYSYLGRFKGDKGLLPQKSTIVQPAGSQYFNTFYFDAALKKDIYR 
I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 
orf 13 3-1 EELGLFFDGPDQDNGLYSYLGRFKGDKGLLPQKSTIVQPAGSQYFNTFYFDAALKKDIYR 
420 430 440 450 460 470 
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470 480 490 500 510 520 

LNYSTNTVGYRFGGXYTGYYXSDDEFKRAFGENSPTYXKHCNQSCGIYEPVLKKYGKKRA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I ! I I I I I I I I I I M 
LNYSTNTVGYRFGGEYTGYYGSDDEFKRAFGENSPTYKKHCNRSCGIYEPVLKKYGKKRA 
480 490 500 510 520 530 

530 540 550 560 570 580 

NNHSVSISADFGDYFMPFA3YSRTHRMPNIQSMYFSQIGDSGVHTALKPERANTWQFGFN 
I I I I I I I I I I I I I I I I ! I ! I Ml! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
NNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTALKPERANTWQFGFN 
540 550 560 570 580 590 

590 600 610 620 630 640 

TYKKGLLKQDDILGLKLVGYRSRIDXYIHNVYGKWWDLNGNIPSWVSSTGLAYTIQHRNF 

TYKKGLLKQDDTLGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWVSSTGLAYTIQHRNF 
600 610 620 630 640 650 



650 



660 



670 



690 



700 



KDKVHKHGFELELNYDYXRFFTNL3YAYQKSTQPTNFSDASESPNNASKEDQLKQGYGLS 



orf 133a. pep 



RVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDXTNGXXTSNFRQLG 
II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I I I I I I I I 
RVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDGTNGGNTSNFRQLG 
'20 730 740 750 760 770 



KRSIXQTETLARQPLI FDXYAAYEPKKXLI FRAEVKNLFDRRYIDPLDAGNDAATQRYYS 



SFDPKDKDEEVTCNDDNTLCNGKYGGT SKSVLTNFARGXT FLITMSYKFX 
SFDPKDKDE DVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSYKFX 



Homology with a predicted ORF from N.sonorrhoeae 

ORF133 shows 92.3% identity over 392 aa overlap with a predicted ORF (ORF133ng) from N. 
gonorrhoeae: 

orf 133. pep PG YYG S DDE FKRAFGEN S PTXKKHCNRS CG I 31 

I I I II : : I I I I I I I I I I I : I : I I : III: 

orfl33ng FYFDAALKKDIYRLNYSTNAINYRFGGEYTGYYGSENE FKRAFGEN SPAYKEHCDPSCGL 560 

orf 133 .pep YEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTAL 91 

orfl33ng YEPVLKKYGKKRANNHSV5ISADFGDYFMPFAGYSRTHRMPNIQEMYFSQIGDSGVHTAL 620 

orf 133 .pep KPERANTWQFGFXTYKKGLLKQDDTLGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWVS 151 

orfl33ng KPERANTWQFGFNTYKKGLLK 680 

orf 133 .pep STGLAYTIQHRXFXDKVHQXXXXXXXXYDYGRFFTNLSYAYQKSTQPTNFSDASESPNNA 211 

orfl33ng STGLAYTIRHRNFKDKVHKHGFELELNYDYGRFFTNLSYAYQKSTQPTNFSDASESPNNA 740 

orf 133 -pep SKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDG 271 

orfl33ng SKEDQLKQGYGLSRVSALPRDYG^ 800 

orf 133. pep TNGGNTSNFRQLGKRSIKQTETLARQPLIXDFNAAYEPKKNLIFRAEVKNLFDRRYIDPL 331 

II I I I I II 1 I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 1 I I I I I I I 

orf 133ng TNGGNTSNVRQLGKRSIKQTETLARQPLIFDFYAAYEPKKNLIFRAEVKNLFDRRYIDPL 8 60 
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orf 133 .pep DAGNDAAXERYYSSFDPKDKDXDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSY 391 

I I I I I I I :: I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I 
orfl33ng DAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSY 920 

orfl33.pep KF 393 
I I 

orfl33ng KF 922 

The complete length ORF133ng nucleotide sequence <SEQ ID 88 1> is predicted to encode a 
protein having amino acid sequence <SEQ ID 882>: 

1 MRSSFRLKPI CFYLMGVMLY HHSYAEDAGR AGSEAQIQVL EDVHVKAKRV 

51 PKDKKVFTDA RAVSTRQDVF KSGENLDNIV RSIPGAFTQQ DKSSGIVSLN 

101 IRGDSGFGRV NTMVDGITQT FYSTSTDAGR AGGSSQFGAS VDSNFIAGLD 

151 VVKGSFSGSA GINSLAGSAN LRTLGVDDW QGNNTYGLLL KGLTGTNSTK 

201 GNAMAAIGAR KWLESGASVG VLYGHSRRGV AQNYRVGGGG QHIGNFGEEY 

251 LERRKQQYFV QEGGLKFNAG 3GKWERDLQR QYWKTKWYKK YEDPQELQKY 

301 IEEHDKSWRE NLAPQYDITP IDPSGLKQQS AGNLLWLEYD GVFNKYTAQF 

351 RDLNTRIGSR KIINRNYQFN YGLSLNPYTN LNLTAAYNSG RQKYPKGAKF 

401 TGWGLLKDFE TYNNAKILDL NNTATFRLPR ETELQTTLGF NYFHNEYGKN 

451 RFPEELGLFF DGPDQDNGLY SYLGRFKGDK GLLPQKSTIV QPAGSQYFNT 

501 FYFDAALKKD IYRLNYSTNA INYRFGGEYT GYYGSENEFK RAFGENSPAY 

551 KEHCDPSCGL YEPVLKKYGK KRANNHSVSI SADFGDYFMP FAGYSRTHRM 

601 PNIQEMYFSQ IGDSGVHTAL KPERANTWQF GFNTYKKGLL KQDDILGLKL 

651 VGYRSRIDNY IHNVYGKWWD LNGDIPSWVG STGLAYTIRH RNFKDKVHKH 

701 GFELELNYDY GRFFTNLSYA YQKSTQPTNF SDASESPNNA SKEDQLKQGY 

751 GLSRVSALPR DYGRLEVGTR WLGN KLTLGG AMRYFGKS IR ATAEERYIDG 

801 TNGGNTSNVR QLGKRSIKQT ETLARQPLIF DFYAAYEPKK NLIFRAEVKN 

851 LFDRRYIDPL DAGNDAATQR YYSSFDPKDK DEDVTCNADK TLCNGKYGGT 

901 SKSVLTNFAR GRTFLMTMSY KF* 

A variant was also identified, being encoded by the gonococcal DNA sequence <SEQ ID 883>: 



1 


ATGAGATCTT 


CTTTCCGGTT 


51 


TATGCTATAT 


CATCATAGTT 


101 


AGGCGCAGAT 


ACAGGTTTTG 


151 


CCGAAAGACA 


AAAAAGTGTT 


201 


gGATGTGTTC AAATCCGGCG 


251 


CCGGTGCGTT 


TACACAGCAA 


301 


ATTCGCGGCG 


ACAGCGGGTT 


351 


CACGCAGACC 


TTTTATTCGA 


401 


CATCTCAATT 


CGGTGCATCT 


451 


GTCGTCAAAG 


GCAGCTTCAG 


501 


TTCGGCGAAT 


CTGCGGACTT 


551 


ATACCTACGG 


CCTGCTGCTA 


601 


GGTAATGCGA 


TGGCGGCGAT 


651 


GTCTGTCGGT 


GTGCTTTACG 


701 


ACCGCGTGGG 


CGGCGGCGGG 


751 


CTGGAACGGC 


GCAAACAGCA 


801 


CAATGCCGGC 


AGCGGAAAAT 


851 


AAACAAAGTG 


GTATAAAAAA 


901 


ATCGAAGAGC 


ATGATAAAAG 


951 


CATCACCCCC 


ATCGATCCGT 


1001 


TGTTTAAATT 


GGAATACGAC 


1051 


CGCGATTTAA 


ACACCAGAAT 


1101 


TCAATTCAAT 


TACGGTTTGT 


1151 


CCGCAGCCTA 


CAATTCGGGC 


1201 


ACAGGCTGGG 


GGCTTTTAAA 


1251 


CCTCGACCTC 


AACAACACCG 


1301 


TGCAAACCAC 


TTTGGGCTTC 


1351 


CGCTTTCCTG 


AAGAATTGGG 


1401 


CGGGCTTTAT 


TCCTATTTGG 


1451 


CTCAAAAATC 


AACCATTGTC 


1501 


TTCTACTTCG 


ATGCCGCGCT 


1551 


CACCAATGCA 


ATCAACTACC 


1601 


GCTCGGAAAA 


CGAATTTAAG 


1651 


AAGGAACATT 


GCGACCCGAG 


1701 


ATACGGCAAA 


AAGCGCGCCA 


1751 


TCGGCGATTA 


TTTCATGCCG 



GAAGCCGATT TGTTTTTATC TTATGGGTGT 
ATGCCGAAGA TGCAGGGCGC GCGGGCAGCG 
GAAGATGTGC ACGTCAAGGC GAAGCGCGTA 
TACCGATGCG CGTGCCGTAT CGACCCGTca 
AAAACCTCGA CAACATCGTA CGCAGCATAC 
GATAAAAGCT CGGGCATTGT GTCTTTGAAT 
CGGGCGGGTC AATACGATGG TGGACGGCAT 
CTTCTACCGA TGCGGGCAGG GCAGGCGGTT 
GTCGACAGCA ATTTTATTGC CGGACTGGAT 
CGGCTCGGCA GGCATCAACA GCCTTGCCGG 
TAGGCGTGGA TGACGTCGTT CAGGGCAATA 
AAAGGTCTGA CCGGCACCAA TTCAACCAAA 
AGGTGCGCGC AAATGGCTGG AAAGCGGAGC 
GGCACAGCAG GCGCGGCGTG GCGCAAAATT 
CAGCACATCG GAAATTTTGG TGAAGAATAT 
ATATTTTGTA CAAGAGGGTG GTTTGAAATT 
GGGAACGGGA TTTGCAAAGG CAATACTGGA 
TACGAAGACC CCCAAGAACT GCAAAAATAC 
CTGGCGGGAA AACCTGGCGC CGCAATACGA 
CCGGCCTGAA GCAGCAGTCG GCAGGCAATC 
GGCGTATTCA ATAAATACAC GGCGCAATTT 
CGGCAGCCGC AAAATCATCA ACCGCAATTA 
CTTTGAACCC GTATACCAAC CTCAATCTGA 
AGGCAGAAAT ATCCGAAAGG GGCGAAGTTT 
AGATTTTGAA ACCTACAACA ACGCGAAAAT 
CCACCTTCCG GCTGCCCCGC GAAACCGAGT 
AATTATTTCC ACAACGAATA CGGCAAAAAC 
GCTGTTTTTC GACGGTCCTG AT C AGGACAA 
GGCGGTTTAA GGGCGATAAA GGGCTGTTGC 
CAACCGGCCG GCAGCCAATA TTTCAACACG 
CAAAAAAGAC ATTTACCGCT TAAACTACAG 
GTTTCGGCGG CGAATATACG GGCTATTACG 
CGGGCATTCG GAGAAAACTC GCCGGCATAC 
CTGCGGGCTT TATGAACCCG TATTGAAAAA 
ACAACCATTC GGTCAGCATT AGTGCGGACT 
TTCGCCGGCT ATTCGCGCAC ACACCGTATG 
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18 01 CCCAACATCC AAGAAATGTA TTTTTCCCAA ATCGGCGACT CCGGCGTTCA 

1851 CACCGCCTTA AAACCAGAGC GCGCAAACAC TTGGCAATTT GGCTTCAATA 

1901 CCTATAAAAA AGGATTGTTA AAACAAGATG ATATATTAGG ATTGAAACTG 

1951 GTCGGCTACC GCAGCCGCAT TGACAACTAC ATCCACAACG TTTACGGGAA 

2001 ATGGTGGGAT TTGAACGGGG ATATTCCGAG CTGGGTCGGC AGCACCGGGC 

2051 TTGCCTACAC CATCCGACAC CGCAATTTCA AAGACAAAGT GCACAAACAC 

2101 GGTTTTGAGC TGGAGCTGAA TTACGATTAT GGGCGTTTTT TCACCAACCT 

2151 TTCTTACGCC TATCAAAAAA GCACGCAACC GACCAATTTC AGCGATGCGA 

2201 GCGAATCGCC CAACAATGCC tccaaAGAAG ACCAACTCAA ACAAGGTTAT 

2251 GGGCTGAGCA GGGTTTCCGC CCTGCCGCGA GATTACGGAC GTTTGGAAGT 

2301 CGGTACGCGC TGGTTGGGCA ACAAACTGAC TTTGGGCGGC GCGAtgcGCT 

2351 ATTTCGGCAA GAGCATCCGC GCGACGGCTG AAGAACGCTA TATCGACGGC 

24 01 ACCAACGGGG GAAAT AC CAG CAATGTCCGG CAACTGGGCA AGCGTTCCAT 

2451 CAAACAAACC GAAACCCTTG CCCGACAGCC TTTGATTTTT GATTTTTACG 

2501 CCGCTTACGA GCCGAAGAAA AACCTTATTT TCCGCGCCGA AGTCAAAAAC 

2551 CTGTTCGACA GGCGTTATAT CGATCCGCTC GATGCGGGCA ATGATGCGGC 

2 601 AACGCAGCGT TATTACAGCT CGTTCGACCC GAAAGACAAG GACGAAGACG 

2 651 TAACGTGTAA TGCTGATAAA ACGTTGTGCA ACGGCAAATA CGGCGGCACA 

27 01 AGCAAAAGCG TATTGACCAA TTTCGCACGC GGACGCACCT TCTTGATGAC 

27 51 GATGAGCTAC AAGTTTTAA 

This corresponds to the amino acid sequence <SEQ ID 884; ORF133ng-l>: 

1 MRSSFRLKPI CFYLMGVMLY HHSYA EDAGR AGSEAQIQVL E DVHVKAKRV 

51 PKDKKVFTDA RAV5TRQDVF KSGENLDNIV RSIPGAFTQQ DKSSGIVSLN 

101 IRGDSGFGRV NTMVDGITQT FYSTSTDAGR AGGSSQFGAS VDSNFIAGLD 

151 WKGSFSGSA GINSLAGSAN LRTLGVDDW QGNNTYGLLL KGLTGTNSTK 

201 GNAMAAIGAR KWLESGASVG VLYGHSRRGV AQNYRVGGGG QHIGNFGEEY 

251 LERRKQQYFV QEGGLKFNAG SGKWERDLQR QYWKTKWYKK YEDPQELQKY 

301 IEEHDKSWRE NLAPQYDITP IDPSGLKQQS AGNLFKLEYD GVFNKYTAQF 

351 RDLNTRIGSR KIINRNYQFN YGLSLNPYTN LNLTAAYNSG RQKYPKGAKF 

4 01 TGWGLLKDFE TYNNAKILDL NNTATFRLPR ETELQTTLGF NYFHNEYGKN 

4 51 RFPEELGLFF DGPDQDNGLY SYLGRFKGDK GLLPQKSTIV QPAGSQYFNT 

501 FYFDAALKKD IYRLNYSTNA INYRFGGEYT GYYGSENEFK RAFGENS PAY 

551 KEHCDPSCGL YEPVLKKYGK KRANNHSVSI SADFGDYFMP FAGYSRTHRM 

601 PNIQEMYFSQ IGDSGVHTAL KPERANTWQF GFNTYKKGLL KQDDILGLKL 

651 VGYRSRIDNY IHNVYGKWWD LNGDIPSWVG STGLAYTIRH RNFKDKVHKH 

7 01 GFELELNYDY GRFFTNLSYA YQKSTQPTNF SDASESPNNA SKEDQLKQGY 

751 GLSRVSALPR DYGRLEVGTR WLGNKLTLGG AMRYFGKSIR ATAEERYIDG 

801 TNGGNTSNVR QLGKRSIKQT ETLARQPLIF DFYAAYEPKK NLIFRAEVKN 

851 LFDRRYIDPL DAGNDAATQR YYSSFDPKDK DEDVTCNADK TLCNGKYGGT 

901 SKSVLTNFAR GRTFLMTMSY KF* 

ORF133ng-l and ORF133-1 show 96.2% identity in 889 aa overlap: 

10 20 30 40 50 60 

orfl33ng-l.pep 5FRLKPICFYLMGVMLYHHSYAEDAGRAGSEAQIQVLEDVHVKAKRVPKDKKVFTDARAV 

I I I I I I I I I I I I I I I I 

orfl33-l EAQ I QVLEDVHVKAKRVPKDKKVFTDARAV 

10 20 30 

70 80 90 100 110 120 

orfl33ng-l.pep STRQDVFK3GENLDNIVRSIPGAFTQQDKSSGIVSLNIRGDSGFGRVNTMVDGITQTFYS 

orf 133-1 STRQDIFKSSENLDNIVRS IPGAFTQQDKSSGIVSLNIRGDSGFGRVNTMVDGITQTFYS 

40 50 60 70 80 90 

130 140 150 160 170 180 

orfl33ng-l.pep T5TDAGRAGG5SQFGASVDSNFIAGLDWKGSFSGSAGINSLAGSANLRTLGVDDWQGN 

I I I I I I I I I I I 1 I I I I I I I I I I I I I I II 

orf 133-1 TSTDAGRAGGSSQFGASVDSNFIAGLDWKGSFSGSAGINSLAGSANLRTLGVDDVVQGN 
100 110 120 130 140 150 

190 200 210 220 230 240 

orf 133ng-l . pep NTYGLLLKGLTGTNSTKGNAMAAIGARKWLESGASVGVLYGHSRRGVAQNYRVGGGGQHI 

orf 133-1 NTYGLLLKGLTGTNSTKGNAMAAIGARKWLESGASVGVLYGHSRRSVAQNYRVGGGGQHI 
160 170 180 190 200 210 

250 260 270 280 290 300 

orfl33ng-l.pep GNFGEEYLERRKQQYFVQEGGLKFNAGSGKWERDLQRQYWKTKWYKKYEDPQELQKYIEE 
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MM M MMMIMMI: MM ■ \ ■ ■ I I 

orf 133-1 GNFGAEYLERRKQRYFVQEGALKFttSDSGKWERDLQRQQWKYKPYKNYNN-QELQKYIEE 

220 230 240 250 260 

310 320 330 340 350 360 

orf 133ng-l . pep HDKSWRENLAPQYDITPIDPSGLKQQSAGNLFKLEYDGVFNKYTAQFRDLNTRIGSRKII 

orf 133-1 HDKSWRENLXPQYDITPIDPSSLKQQSAGNLFKLEYDGVFNKYTAQFRDLNTKIGSRKII 
270 280 290 300 310 320 

370 380 390 400 410 420 

orfl33ng-l.pep NRNYQFNYGLSLNPYTNLKLTAAYNSGRQKYPKGAKFTGWGLLKDFETYNNAKILDLNNT 



430 440 450 460 470 480 

.rf 133ng-l . pep ATFRLPRETELQTTLGFNYFHNEYGKNRFPEELGLFFDGPDQDNGLYSYLGRFKGDKGLL 

.rf 133-1 ATFRLPRETELQTTLGFNYFHNEYGKNRFPEELGLFFDGPDQDNGLYSYLGRFKGDKGLL 
390 400 410 420 430 440 



PQKSTIVQPAGSQYFNTFYFDAALKKDIYRLNYSTNTVGYRFGGEYTGYYGSDDEFKRAF 



550 560 570 580 590 600 

orf 133ng-l . pep GENSPAYKEHCDPSCGLYE PVLKKYGKKRANNHSVSISADFGDYFMPFAGYSRTHRMPNI 



QEMYFSQIGDSGVHTALKPERANTWQFGFNTYKKGLLKQDDTLGLKLVGYRSRIDNYIHN 



orf 133ng-l . pep VYGKWWDLNGDIPSWVGSTGLAYTIRHRNFKDKVHKHGFELELNYDYGRFFTNLSYAYQK 

orf 133-1 VYGKWWDLNGDIPSWVSSTGLAYTIQHRNFKDKVHKHGFELELNYDYGRFFTNLSYAYQK 
630 640 650 660 670 680 

730 740 750 760 770 780 

orfl33ng-l.pep STQPTNFSDASESPNNASKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMR 

orf 133-1 STQPTNFSDASE5PNNASKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMR 
690 700 710 720 730 740 



790 800 810 320 830 840 

^ orf 133ng-l .pep YFGKSIRATAEERYIDGTNGGNTSNVRQLGKRSIKQTETLARQPLIFDFYAAYEPKKNLI 

orf 133-1 YFGKSIRATAEERYIDGTNGGNTSNFRQLGKRSIKQTETLARQPLIFDFYAAYEPKKNLI 
750 760 770 780 790 800 



850 860 870 880 890 900 

60 orf 133ng-l .pep FRAEVKNLFDRRYIDPLDAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTSKS 

orf 133-1 FRAEVKNLFDRRYIDPLDAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTSKS 
810 820 830 840 850 860 

65 910 920 

orfl33ng-l .pep VLTN FARGRT FLMTMS YKFX 

M M M M M M M M M M 
orfl33-l VLTNFARGRTFLMTMSYKFX 
870 880 



70 In addition, ORF133ng-l is homologous to a TonB-dependent receptor in H.influenzae: 
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sp|P45114 |YC17_HAEIN PROBABLE TONB- DEPENDENT RECEPTOR HI1217 PRECURSOR 
>gi I 1075372 |plr | | G64110 transferrin binding protein 1 precursor (tbpl) homolog - 
Haemophilus influenzae {strain Rd KW20) >gi 11574147 (U32801) transferrin binding 
protein 1 precursor (tbpl) [Haemophilus influenzae] Length = 913 
Score = 930 bits (2377), Expect = 0.0 

Identities = 476/921 (51%), Positives = 619/921 (66%), Gaps = 72/921 (7%) 

Query: 38 QVLEDVHVKAKRVPKDKKVFTDARAVSTRQDVFKSGENLDNIVRSIPGAFTQQDKSSGIV 97 

+ L + V K + DKK FT+A+A STR++VFK + +D ++RS I PGAFTQQDK SG+V 
Sbjct: 29 ETLGQI DWEKVI SNDKKPFTEAKAKSTREN VFKETQT I DQVIRS I PGAFTQQDKGSGW 88 

Query: 98 SLNIRGDSGFGRVNTMVDGITQTFYSTSTDAGRAGGSSQFGASVDSNFIAGLDVVKGSFS 157 

S+NIRG++G GRVNTMVDG+TQTFYST+ D+G++GGSSQFGA++D NFIAG+DV K +FS 
Sbjct: 89 SVNIRGENGLGRVNTMVDGVTQTFYSTALDSGQSGGSSQFGAAIDPNFIAGVDVNKSNFS 148 

Query: 158 GSAGINSLAGSANLRTLGVDDWQXXXXXXXXXXXXXXXXXXXXXAMAAIGARKWLESGA 217 

G++GIN+LAGSAN RTLGV+DV+ M RKWL++G 

Sbjct: 14 9 GASGINALAGSANFRTLGVNDVITDDKPFGIILKGMTGSNATKSNFMTMAAGRKWLDNGG 208 

Query: 218 SVGVLYGHSRRGVAQNYRVGGGGQHIGNFGEEYLERRKQQYFVQEGGLKFNAGSGKWERD 277 

VGV+YG+S+R V+Q+YR+ GGG+ + + G++ L + K+ YF + G N G+W D 
Sbjct: 209 YVGWYGYSQREVSQDYRI-GGGERLASLGQDILAKEKEAYF-RNAGYILNP-EGQWTPD 265 

Query: 278 LQRQYWK TKWY KKYEDPQELQK— - YIEE 303 

L +++W +Y KK +D ++LQK IEE 

Sbjct: 266 LSKKHWSCNKPDYQKNGDCSYYRIGSAAKTRREILQELLTNGKKPKDIEKLQKGNDGIEE 325 

Query: 304 HDKSWRENLAPQYDITPIDPSGLKQQSAGNLFKLEYDGVFNKYTAQFRDLNTRIGSRKII 363 

DKS+ N QY + PI+P L+ +S +L K EY AQ R L+ +IGSRKI 

Sbjct: 32 6 TDKSFERN-KDQYSVAPIEPGSLQSRSRSHLLKFEYGDDHQNLGAQLRTLDNKIGSRKIE 384 

Query: 364 NRNYQFNYGLSLNPYTNLNLTAAYNSGRQKYPKGAKFTGWGLLKDFETYNNAKILDLNNT 423 

NRNYQ NY + N Y +LNL AA+N G+ YPKG F GW + T N A I+D+NN+ 

Sbjct: 385 NRNYQVNYNFNNNSYLDLNLMAAHNIGKTI YPKGGFFAGWQVADKLITKNVANIVDINNS 444 

Query: 424 ATFRLPRETELQTTLGFNYFHNEYGKNRFPEELGLFFDGPDQDNGLYSY — LGRFKGDKG 481 

TF LP+E +L+TTLGFNYF NEY KNRFPEEL LF++ D GLYS+ GR+ G K 
Sbjct: 445 HTFLLPKEIDLKTTLGFNYFTNEYSKNRFPEELSLFYNDASHDQGLYSHSKRGRYSGTKS 504 

Query: 4 82 LLPQKSTIVQPAGSQYFNTFYFDAALKKDIYRLNYSTNAINYRFGGEYTGYYGSENEFKR 541 

LLPQ+S I+QP+G Q F T YFD AL K IY LNYS N +Y F GEY GY 
Sbjct: 505 LLPQRSVILQPSGKQKFKTVYFDTALSKGI YHLNYSVNFTHYAFNGEYVGY 555 

Query: 54 2 AFGENSPAYKEHCDPSCGLYEPVLKKYGKKRANNHSVSISADFGDYFMPFAGYSRTHRMP 601 

EN+ + + EP+L K G K+A NHS ++SA+ DYFMPF YSRTHRMP 

Sbjct: 556 — ENTAGQQ INEPILHKSGHKKAFNHSATLSAELSDYFMPFFTYSRTHRMP 604 

Query: 602 NIQEMYFSQIGDSGVHTALKPERANTWQFGFNTYKKGLLKQDDILGLKLVGYRSRIDNYI 661 

NIQEM+FSQ+ ++GV+TALKPE+++T+Q GFNTYKKGL QDD+LG+KLVGYRS I NYI 
Sbjct: 60 5 NIQEMFFSQVSNAGVNTALKPEQSCTYQLGFNTYKKGLFTQDDVLGVKLVGYRSFIKNYI 664 

Query: 662 HNVYGKWWDLNGDIPSWVGSTGLAYTIRHRNFKDKVHKHGFELELNYDYGRFFTNLSYAY 721 

HNVYG WW +P+W S G YTI H+N+K V K G ELE+NYD GRFF N+SYAY 

Sbjct: 665 HNVYGVWW — RDGMPTWAESNGFKYTIAHQNYKPIVKKSGVELEINYDMGRFFANVSYAY 722 

Query: 722 QKSTQPTNFSDASESPNNASKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGA 781 

Q++ QPTN++DAS PNNAS+ED LKQGYGLSRVS LP+DYGRLE+GTRW KLTLG A 
Sbjct: 723 QRTNQPTNYADASPRPNNASQEDILKQGYGLSRVSMLPKDYGRLELGTRWFDQKLTLGLA 782 

Query: 782 MRYFGKSIRATAEERYIDGTNGGNTSNVRQLGKRSIKQTETLARQPLIFDFYAAYEPKKN 841 

RY+GKS RAT EE YI+G+ + +R+ ++K+TE + +QP+I D + +YEP K+ 

Sbjct: 783 ARYYGKSKRATIEEEYINGSR-FKKNTLRRENYYAVKKTEDIKKQPIILDLHVSYEPIKD 841 

Query: 842 LIFRAEVKNLFDRRYIDPLDAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTS 901 

LI +AEV+NL D+RY+DPLDAGNDAA+QRYYSS + + C D + C GG+ 
Sbjct: 842 LIIKAEVQNLLDKRYVDPLDAGNDAASQRYYSSL NNSIECAQDSSAC GGSD 892 

Query: 902 KSVLTNFARGRTFLMTMSYKF 922 

K+VL NFARGRT++++++YKF 
Sbjct: 8 93 KTVLYN FARGRTYI LSLNYKF 913 
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The underlined motif in the gonococcal protein (also present in the meningococcal protein) is 
predicted to be an ATP/GTP-binding site motif A (P-loop), and the analysis suggests that these 
proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 

Example 104 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 885> 

1 ATGAACCTGA TTTCACGTTA CATCATCCGT CAAATGGCGG TTATGGCGGT 

51 TTACGCGCTC CTTGCCTTCC TCGCTTTGTA CAGCTTTTTT GAAATCCTGT 

101 ACGAAACCGG CAACCTCGGC AAAGGCAGTT ACGGCATATG GGAAATGCTG 

151 GGCTACACCG CCCTCAAAAT GCCCGCCCGC GCCTACGAAC TGATTCCCCT 

201 CGCCGTCCTT ATCGGCGGAC TGGTCTCCCT CAGCCAGCTT GCCGCCGGCA 

251 GCGAACTGAC CGTCATCAAA GCCAGCGGCA TGAGCACCAA AAAGCTGCTG 

301 TTGATTCTGT CGCAGTTCGG TTTTATTTTT GCTATTGCCA CCGTCGCGCT 

351 CGGCGAATGG GTTGCGCCCA CACTGAGCCA AAAAGCCGAA AACATCAAAG 

401 CCGCCGCCAT CAACGGCAAA ATCAGCACCG GCAATACCGG CCTTTGGCTG 

4 51 AAAGAAAAAA ACAGCGTGAT CAATGTGCGC GAAATGTTGC CCGACCAT . . 

This corresponds to the amino acid sequence <SEQ ID 886; ORF112>: 

1 MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEML 

51 GYTALKMPAR AYE LIPLAVL IGGLVSLSQL AAGSELTVIK ASGMSTKKLL 

101 LILSQFGFIF AIATV ALGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 

151 KEKNSVINVR EMLPDH. . . 

Further work revealed further partal nucleotide sequence <SEQ ID 887>: 

1 ATGAACCTGA TTTCACGTTA CATCATCCGT CAAATGGCGG TTATGGCGGT 

51 TTACGCGCTC CTTGCCTTCC TCGCTTTGTA CAGCTTTTTT GAAATCCTGT 

101 ACGAAACCGG CAACCTCGGC AAAGGCAGTT ACGGCATATG GGAAATGCTG 

151 gGCTACACCG CCCTCAAAAT GCCCGCCCGC GCCTACGAAC TGATTCCCCT 

2 01 CGCCGTCCTT ATCGGCGGAC TGGTCTCCCT CAGCCAGCTT GCCGCCGGCA 

251 GCGAACTGAC CGTCATCAAA GCCAGCGGCA TGAGCACCAA AAAGCTGCTG 

301 TTGATTCTGT CGCAGTTCGG TTTTATTTTT GCTATTGCCA CCGTCGCGCT 

351 CGGCGAATGG GTTGCGCCCA CACTGAGCCA AAAAGCCGAA AACATCAAAG 

4 01 CCGCCGCCAT CAACGGCAAA ATCAGCACCG GCAATACCGG CCTTTGGCTG 

451 AAAGAAAAAA ACAGCrTkAT CAATGTGCGC GAAATGTTGC CCGACCATAC 

501 GCTTTTGGGC ATCAAAATTT GGGCGCGCAA CGATAAAAAC GAATTGGCAG 

551 AGGCAGTGGA AGCCGATTCC GCCGTTTTGA ACAGCGACGG CAGTTGGCAG 

601 TT GAAAAAC A TCCGCCGCAG CACGCTTGGC GAAGACAAAG TCGAGGTCTC 

651 TATTGCGGCT GAAGAAAACT GGCCGATTTC CGTCAAACGC AACCTGATGG 

7 01 ACGTATTGCT CGTCAAACCC GACCAAATGT CCGTCGGCGA ACTGACCACC 

7 51 TACATCCGCC ACCTCCAAAA CAACAGCCAA AACACCCGAA TCTACGCCAT 

8 01 CGCATGGTGG CGCAAATTGG TTTACCCCGC CGCAGCCTGG GTGATGGCGC 
851 TCGTCGCCTT TGCCTTTACC CCGCAAACCA CCCGCCACGG CAATATGGGC 
901 TTAAAACTCT TCGGCGGCAT CTGTsTCGGA TTGCTGTTCC ACCTTGCCGG 
951 ACGGCTCTTT GGGTTTACCA GCCAACTCGG. . . 

This corresponds to the amino acid sequence <SEQ ID 888; ORF1 12-1>: 

1 MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEML 

51 GYTALKMPAR A YE LIPLAVL IGGLVSLSQL AAGSELTVIK ASGMSTKKLL 

101 LILSQFGFIF AIATV ALGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 

151 KEKNSXINVR EMLPDHTLLG IKIWARNDKN ELAEAVEADS AVLNSDGSWQ 

201 LKNIRRSTLG EDKVEVSIAA EENWPISVKR NLMDVLLVKP DQMSVGELTT 

251 YIRHLQNNSQ NTRIYAIAWW R KLVYPAAAW VMALVAFAF T PQTTRHGNMG 

301 LKLFGGICXG LLFHLA GRLF GFTSQL. . . 

Computer analysis of this amino acid sequence predicts two transmembrane domains and gave the 
following results: 
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Homology with a predicted QRF from N. meningitidis (strain A) 

ORF1 12 shows 96.4% identity over a 166aa overlap with an ORF (ORF1 12a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 112 . pep MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKG3YGIWEMLGYTALKMPAR 

orfll2a MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMXGYTALKMXAR 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf 112. pep AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 

1:11111111111 I 1:111111 I 

orf 112a AYELMPLAVLIGGLVSXSQLAAGSELXVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 

70 80 90 100 110 120 

130 140 150 160 

orf 112. pep VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSVINVREMLPDH 

1 I I I I I I I I I I I I ! I I I I I I I I : 

orf 112a VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSIINVREMLPDHTLLGIKIWARNDKN 

130 140 150 160 170 180 

orf 112a ELAEAVEADSAVLNSDGSWQLKNIRRSTLGEDKVEVSIAAEEXWPISVKRNLMDVLLVKP 
190 200 210 220 230 240 

The ORF 1 12a nucleotide sequence <SEQ ID 889> is: 

1 ATGAACCTGA TTTCACGTTA CATCATCCGT CAAATGGCGG TTATGGCGGT 

51 TTACGCGCTC CTTGCCTTCC TCGCTTTGTA CAGCTTTTTT GAAATCCTGT 

101 ACGAAACCGG CAACCTCGGC AAAGGCAGTT ACGGCATATG GGAAATGNTG 

151 GGNTACACCG CCCTCAAAAT GNCCGCCCGC GCCTACGAAC TGATGCCCCT 

201 CGCCGTCCTT ATCGGCGGAC TGGTCTCTNT CAGCCAGCTT GCCGCCGGCA 

251 GCGAACTGAN CGTCATCAAA GCCAGCGGCA TGAGCACCAA AAAGCTGCTG 

301 TTGATTCTGT CGCAGTTCGG TTTTATTTTT GCTATTGCCA CCGTCGCGCT 

351 CGGCGAATGG GTTGCGCCCA CACTGAGCCA AAAAGCCGAA AACATCAAAG 

401 CCGCGGCCAT CAACGGCAAA AT CAGTACCG GCAATACCGG CCTTTGGCTG 

451 AAAGAAAAAA ACAGCATTAT CAATGTGCGC GAAATGTTGC CCGACCATAC 

501 CCTGCTGGGC ATTAAAATCT GGGCCCGCAA CGATAAAAAC GAACTGGCAG 

551 AGGCAGTGGA AGCCGATTCC GCCGTTTTGA ACAGCGACGG CAGTTGGCAG 

601 TTGAAAAACA TCCGCCGCAG CACGCTTGGC GAAGACAAAG TCGAGGTCTC 

651 TATTGCGGCT GAAGAAAANT GGCCGATTTC CGTCAAACGC AACCTGATGG 

701 ACGTATTGCT CGTCAAACCC GACCAAATGT CCGTCGGCGA ACTGACCACC 

751 TACATCCGCC ACCTCCAAAN NNACAGCCAA AACACCCGAA TCTACGCCAT 

801 CGCATGGTGG CGCAAATTGG TTTACCCCGC CGCAGCCTGG GTGATGGCGC 

851 TCGTCGCCTT TGCCTTTACC CCGCAAACCA CCCGCCACGG CAATATGGGC 

901 TTAAAANTCT TCGGCGGCAT CTGTCTCGGA TTGCTGTTCC ACCTTGCCGG 

951 NCGGCTCTTC NGGTTTACCA GCCAACTCTA CGGCATCCCG CCCTTCCTCG 

1001 NCGGCGCACT ACCTACCATA GCCTTCGCCT TGCTCGCCGT TTGGCTGATA 

1051 CGCAAACAGG AAAAACGCTA A 

This encodes a protein having the amino acid sequence <SEQ ID 890>: 

1 MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEMX 

51 GYTALKMXAR A YE LMPLAVL IGGLVSXSQ L AAGSSLXVIK ASGMSTKKLL 

101 LILSQFGFIF AIATV ALGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 

151 KEKNSIINVR EMLPDHTLLG IKIWARNDKN ELAEAVEADS AVLNSDGSWQ 

201 LKNIRRSTLG EDKVEVSIAA EEXWPISVKR NLMDVLLVKP DQMSVGELTT 

251 YIRHLQXXSQ NTRIYAIAWW R KLVYPAAAW VMALVAFAF T PQTTRHGNMG 

301 LKXFGGICLG LLFHL AGRLF XFTSQLYGIP PFLXGALPTI AFALLAVWLI 

351 RKQEKR* 

ORP1 12a and ORF1 12-1 show 96.3% identity in 326 aa overlap: 

or f 112a . pep MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMXGYTALKMXAR 

I I I I I I I II I I I 1 I I I I I I I 1 I I I I I I I I I II 

orfl!2-l MNLI SRYI IRQMAVMAVYALLAFLALYS FFE ILYETGNLGKGSYGIWEMLGYTALKMPAR 



orfll2a.pep 



AYELMPLAVLIGGLVSXSQLAAGSELXVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 
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orfll2a.pep 



AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 

VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSIINVREMLPDHTLLGIKIWARNDKN 

VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSXINVREMLPDHTLLGIKIWARNDKN 

ELAEAVEADSAVLNSDGSWQLKNIRRSTLGEDKVEVSIAAEEXWPISVKRNLMDVLLVKP 

ELAEAVEADSAVLNSDGSWQLKNIRRSTLGEDKVEVSIAAEENWPISVKRNLMDVLLVKP 

DQMSVGELTTYIRHLQXXSQNTRIYAIAWWRKLVYPAAAWVMALVAFAFTPQTTRHGNMG 

III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

DQMSVGELTTYIRHLQNNSQNTRIYAIAWWRKLVYPAAAWVMALVAFAFTPQTTRHGNMG 

LKXFGGICLGLLFHLAGRLFXFTSQLYGIPPFLXGALPTIAFALLAVWLIRKQEKRX 

LKLFGGICXGLLFHLAGRLFGFTSQL 



Homology with a predicted ORF from N. gonorrhoeae 

ORF112 shows 95.8% identity over 166aa overlap with a predicted ORF (ORF112ng) from N. 
gonorrhoeae: 



orfll2.pep 


MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 


60 


orfll2ng 


MNLISRYIIRQiylAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 


60 


orfll2.pep 


AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 
1 1 1 1 : 1 1 1 1 1 1 1 1 1 : 1 1 1 1 1 1 1 1 1 1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 : 1 1 1 1 1 1 
AYELMPLAVLIGGLASLSQLAAGSELAVIKASGMSTKKLLLILSQFGFIFAIAAVALGEW 


120 


orfll2ng 


120 


orfll2.pep 


VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSVINVREMLPDH 
1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 : 1 : 1 1 1 1 1 1 1 1 1 


166 


orf 112ng 


VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKTSIINVRGMLPDHTLLGIK1WARNDKN 


180 



The complete length ORF1 12ng nucleotide sequence <SEQ ID 891> is: 



1 ATGAACCTGA TTTCACGTTA CATCATCCGC CAAATGGCGG TTATGGCGGT 

51 TTACGCGCTC CTTGCCTTCC TCGCTTTGTA CAGCTTTTTT GAAATCCTGT 

101 ACGAAACCGG CAACCTCGGC AAAGGCAGTT ACGGCATATG GGAAATGCTG 

151 GGCTACACCG CCCTCAAAAT GCCCGCCCGC GCCTACGAAC TCATGCCCCT 

201 CGCCGTCCTC ATCGGCGGAC TGGCCTCTCT CAGCCAGCTT GCCGCCGGCA 

251 GCGAACTGGC CGTCATCAAA GCCAGCGGCA TGAGCACCAA AAAGCTGCTG 

301 TTGATTCTGT CTCAGTTCGG TTTTATTTTT GCTATTGCCG CCGTCGCGCT 

351 CGGCGAATGG GTTGCGCCCA CGCTGAGCCA AAAAGCCGAA AACATCAAag 

401 cCGCCGCCAt taacggCAAA ATCAGCAccg gcAATACCGG CCTTTggcTG 

451 AAAGAAAAAa CCAGCATTAT CAATGTGcGc GGAATGTTGC CCGACCATAC 

501 GCTTTTGGGC ATCAAAATTT GGGCGCGCAA CGATAAAAAC GAATTGGCAG 

551 AGGCAGTGGA AGCCGATTCC GCCGTTTTGA ACAGCGACGG CAGCTGGCAG 

601 TTGAAAAACA TCCGCCGCAG CATCATGGGT ACAGACAAAA TCGAAACATC 

651 cgCCGCCGCC GAAGAAACTT gGCCGATTGC CGTCAGACGC AACCTGATGG 

701 ACGTATTGCT CGTCAAGCCC GACCAAATGT CCGTCGGCGA GCTGACCACC 

751 TACATCCGCC ACCTCCAAAA CAACAGCCAA AACACCCAAA TCTACGCCAT 

801 CGCATGGTGG CGTAAACTCG TTTACCCCGT CGCCGCATGG GTCATGGCGC 

851 TCGTTGCCTT CGCCTTTACG CCGCAAACCA CGCGCCACGG CAATATGGGC 

901 TTAAAACTCT TCGGCGGCAT CTGTCTCGGA TTGCTGTTCC ACCTTGCCGG 

951 CAGGCTCTTC GGGTTTACCA GCCAACTCTA CGGCACCCCA CCCTTCCTCG 

1001 CCGGCGCACT GCCTACCATA GCCTTCGCCT TGCTCGCTGT TTGGCTGATA 

1051 CGCAAACAGG AAAAACGTTG A 

This encodes a protein having amino acid sequence <SEQ ID 892>: 

1 MNLISRYIIR QMAVMAVYAL LAFLALYS FF EILYETGNLG KGSYGIWEML 

51 GYTALKKPAR A YE LMPLAVL IGGLASLSQL AAGSELAVIK ASGMSTKKLL 

101 LILSQFGFIF AIAAV ALGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 

151 KEKTSIINVR GMLPDHTLLG IKIWARNDKN ELAEAVEADS AVLNSDGSWQ 

201 LKNIRRSIMG TDKIETSAAA EETWPIAVRR NLMDVLLVKP DQMSVGELTT 

251 YIRHLQNNSQ NTQIYAIAWW RK LVYPVAAW VMALVAFAF T PQTTRHGNMG 

301 LKLFGGICLG LLFHLA GRLF GFTSQLYGTP PFL AGALPTI AFALLAVWLI 
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351 RKQEKR* 

ORF1 12ng and ORF1 12-1 show 94.2% identity in 326 aa overlap: 



orfll2ng 
orfll2-l 



MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 
I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 



orfll2ng 
orfll2-l 



AYELMPLAVLIGGLASLSQLAAGSELAVIKASGMSTKKLLLILSQFGFIFAIAAVALGEW 
AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 



orfll2ng 
orfll2-l 



VAPTLSQKAEN I KAAAINGKI STGNTGLWLKEKTS I INVRGMLPDHTLLG IKIWARNDKN 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I 
VAPTLSQKAEN IKAAAINGKISTGNTGLWLKEKNSXINVREMLPDHTLLGIKIWARNDKN 



orfll2ng ELAEAVEADSAVLNSDGSWQLICNIRRSIMGTDKIETSAAAEETWPIAVRRNLMDVLLVKP 
I I I I i I I I I I I I I I I I I I I I I I I I I I I : I 11:1:1 I I I I : I I I : I : I I I I I I I I I I I 

or f 112-1 ELAEAVEADSAVLNSDGSWQLKNIRRSTLGEDKVEVSIAAEENWPISVKRNLMDVLLVKP 
190 200 210 220 230 240 

250 260 270 280 290 300 

orfll2ng DQMSVGELTTYIRHLQNNSQNTQIYAIAWWRKLVYPVAAWVMALVAFAFT PQTTRHGNMG 

orf 112-1 DQMSVGELTTYIRHLQNNSQNTRIYAIAWWRKLVYPAAAWVMALVAFAFT PQTTRHGNMG 

250 260 270 280 290 300 



orfll2ng 
orfll2-l 



LKLFGCICLGLLFHLAGRLFGFTSQLYGTPFFLAGALPTIAFALLAVWLIRKQEKRX 

I I I I I I I I I I I I I I I I I I I I I I I I I 

LKLFGGICXGLLFHLAGRLFGFTSQL 



This analysis suggests that these proteins from N. meningitidis and N. gonorrhoeae, and their 
40 epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



It will be appreciated that the invention has been described by means of example only, and that 
modifications may be made whilst remaining within the spirit and scope of the invention. 
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TABLE I - PCR primers 



ORF 


Primer 


Sequence 


Restriction sites 


ORF 1 


Forward 
Reverse 


CGCGGATCCGCTAGC-GGACACACTTATTTCGG 
CCCGCTCGAG-CCAGCGGTAGCCTAATT 


BamHI-Nhel 
Xhol 


ORF 2 


Forward 
Reverse 


GCGGATCCCATATG-TTTGATTTCGGTTTGGG 
CCCGCTCGAG-GACGGCATAACGGCG 


BamHI-Ndel 
Xhol 


ORF 2-1 


Forward 
Reverse 


GCGGATCCCATATG-TTTGATTTCGGTTTGGG 
C C CGCTCGAG- T G AT T T AC G G AC G C G C A 


BamHI-Ndel 
Xhol 


ORF 4 


Forward 
Reverse 


GCGGATCCCATATG-TGCGGAGGTCAAAAAGAC 
CCCGCTCGAG— TTTGGCTGCGCCTTC 


BamHI-Ndel 
Xhol 


ORF 5 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG-TGGAAGGCGCACAACC 
C GGGATCC - AT GGAAGGCGCACAAC 
CCCGCTCGAG-GACTGTGCAAAAACGG 


Ndel-Ncol 

BamHI 

Xhol 


ORF 6 


Forward 
Reverse 


CGCGGATCCCATATG-ACCCGTCAATCTCTGCA 
CCCGCTCGAG-TC-CGCCGAACACTTTC 


BamHI-Ndel 
Xhol 


ORF 7 


Forward 
Reverse 


CGCGGATCCGCTAGC-GCGCTGCTTTTTGTTCC 
CCCGCTCGAG-TTTCAAAATATATTTGCGGA 


BamHI-Nhel 
Xhol 


ORF 8 


Forward 
Reverse 


GCGGATCCCATATG-GCTCAACTGCTTCGTAC 
CCCGCTCGAG- AGCAGGCTTTGGCGC 


BamHI-Ndel 
Xhol 


ORF 9 


Forward 
Reverse 


CGCGGATCCCATATG— CCGAAGGAAGTCGGAAA 
CCCGCTCGAG-TTTCCGAGGTTTTCGGG 


BamHI-Ndel 
Xhol 


ORF 10 


Forward 
Reverse 


GCGGATCCCATATG-GACACAAAAGAAATCCTC 
CCCGCTCGAG- TAATGGGAAACCTTGTTTT 


BamHI-Ndel 
Xhol 


ORF 11 


Forward 
Reverse 


GCGGATCCCATATG-GCGGTCAACCTCTACG 
CCCGCTCGAG~GGAAACGACTTCGCC 


BamHI-Ndel 
Xhol 


ORF 13 


Forward 
Reverse 


CGCGGATCCCATATG-GCTCTGCTTTCCGCGC 
CCCGCTCGAG-AGGGTGTGTGATAATAAG 


BamHI-Ndel 
Xhol 


ORF 15 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG-GCGGGACACTGACAG 

CGGGATCC-TGCGGGACACTGACAGG 

CCCGCTCGAG-AGGTTGGCCTTGTCTATG 


Ndel-Ncol 

BamHI 

Xhol 


ORF 17 


Forward 


GGAATTCCATATGGCCATGG -TTGCCGGCCTGTTCG 


Ndel-Ncol 



WO 99/24578 



-488- 



PCT/IB98/01665 





Forward 
Reverse 


CGGGATCC- ATTGCCGGCCTGTTCG 
CCCGCTCGAG-AAGCAGGTTGTACAGC 


BamHI 
Xhol 


ORF 18 


Forward 
Reverse 


GCGGATCCCATATG-ATTTTGCTGCATTTGGAT 
CCCGCTCGAG-TCTTCCAATTTCTGAAAGC 


oanini-JNciei 
Xhol 


ORF19 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG -TCGCCAGTGTTTTTACC 

CGGGATCC-TTCGCCAGTGTTTTTACCG 

CCCGCTCGAG-GGTGTTTTTGAAGCTGCC 


Ndel-Ncol 

BarnHI 

Xhol 


ORF 20 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG -TCGGCGCGGGTATG 
CGGGATCC- TTCGGCGCGGGTATG 
CCCGCTCGAG- CGGCGAGCGAGAGCA 


Ndel-Ncol 

BamHI 

Xhol 




Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG- TGATTAAAATCAAAAAAGGTCT 
CGGGATCC-ATGATTAAAATCAAAAAAGGTCTAAACC 
CCCGCTCGAG- ATTATGATAGCGGCCC 


XMpT XTrrvT 
INUCl-INCOl 

BamHI 
Xhol 


ORF 23 


Forward 
Reverse 


CGCGGATCCCATATG-GATGTTTCTGTTTCAGAC 
CCCGCTCGAG- TTTAAACCGATAGGTAAACG 


iiamrll-JNael 
Xhol 


ORF 24 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG -TGATGCCGGAAATGGTG 
CGGGATCC-ATGATGCCGGAAATGGTG 
CCCGCTCGAG- TGTCAGCGTGGCGCA 


JNaei-JNcoi 

BamHI 

Xhol 


ORF 25 


Forward 
Reverse 


GCGGATCCCATATG-TATCGCAAACTGATTGC 
CCCGCTCGAG- ATCGATGGAATAGCCG 


BamHI-Ndel 
Xhol 


ORF 26 


Forward 
Reverse 


GCGGATCCCATATG - CAGCT GAT C GAC T AT T C 
CCCGCTCGAG-GACATCGGCGCGTTTT 


iJamtii-iNaei 
Xhol 


ORF 27 


Forward 
Forward 
Reverse 


GGAAT T CCATATGGCCATGG-AGAC CTATTCTGTTTA 
CGGGATCC- CAGACCTATTCTGTTTATTTTAATC 
CCCGCTCGAG-GGGTTCGATTAAATAACCAT 


Ndel-Ncol 

BamHI 

Xhol 


ORF 28 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG-ACGGCTGTACGTTGATGT 
CGGGATCC- AACGGCTGTACGTTGATG 
CCCGCTCGAG-TTTGTCAGAGGAATTCGCG 


Ndel-Ncol 

BamHI 

Xhol 


ORF 29 


Forward 
Forward 
Reverse 


GCGGATCCCATATG -AACGGTTTGGATGCCCG 
CGCGGATCCGCTAGC-AACGGTTTGGATGCCCG 
CCCGCTCGAG-TTTGTCTAAGTTCCTGATATG 


oamrii-JNuei 
BamHI-Nhel 
Xhol 


ORF 32 


Forward 
Reverse 


CGCGGATCCCATATG-AATACTCCTCCTTTTG 
CCCGCTCGAG-GCGTATTTTTTGATGCTTTG 


BamHI-Ndel 
Xhol 


ORF 33 


Forward 
Reverse 


GCGGATCCCATATG -ATTGATAGGGATCGTATG 
CCCGCTCGAG-TTGATCTTTCAAACGGCC 


BamHI-Ndel 
Xhol 
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ORF 35 


Forward 
Forward 
Reverse 


GCGGATCCCATATG-TTCAGAGCTCAGCTT 
CGCGGATCCGCTAGC-TTCAGAGCTCAGCTT 
CCCGCTCGAG-AAACAGCCATTTGAGCGA 


iJamtll-JNael 
BamHI-Nhel 
Xhol 


ORF37 


Forward 
Reverse 


GCGGATCCCATATG-GATGACGTATCGGATTTT 
CCCGCTCGAG-ATAGCCCGCTTTCAGG 


BamHI-Ndel 
Xhol 


ORF 58 


Forward 
Reverse 


CGCGGATCCGCTAGC-TCCGAACGCGAGTGGAT 
CCCGCTCGAG— AGCATTGTCCAAGGGGAC 


BamHI-Nhel 
Xhol 


ORF 65 


Forward 

Forward 
Reverse 


GGAATTCCATATGGCCATGG -TGCTGTATCTGAATCAAG 

CGGGATCC-TTGCTGTATCTGAATCAAGG 
CCCGCTCGAG-CCGCATCGGCAGACA 


in del- JN col 

BamHI 

Xhol 


ORF 66 


Forward 
Reverse 


GCGGATCCCATATG-TACGCATTTACCGCCG 
CCCGCTCGAG -TGGATTTTGCAGAGATGG 


BamHI-Ndel 
Xhol 


ORF 72 


Forward 
Reverse 


CGCGGATCCCATATG- AATGCAGTAAAAATATCTGA 
CCCGCTCGAG- GCCTGAGACCTTTGCAA 


BamHI-Ndel 
Xhol 


ORF 73 


Forward 
Reverse 


GCGGATCCCATATG-AGATTTTTCGGTATCGG 
CCCGCTCGAG-TTCATCTTTTTCATGTTCG 


BamHI-Ndel 
Xhol 


ORF 75 


Forward 
Reverse 


GCGGATCCCATATG- TCTGTC-TTCAAACGGC 
CCCGCTCGAG- TTTGTTTTTGCAAGACAG 


BamHI-Ndel 
Xhol 


ORF 76 


Forward 
Reverse 


GATCAGCTAGCCATATG-AAACAGAAAAAAACCGC 
CGGGATCC-TTACGGTTTGACACCGTT 


Nhel-Ndel 
BamHI 


ORF 79 


Forward 
Reverse 


CGCGGATCCCATATG-GTTTCCGCCGCCG 
CCCGCTCGAG-GTGCTGATGCGCTTCG 


BamHI-Ndel 
Xhol 


ORF 83 


Forward 
Reverse 


GCGGATCCCATATG-AAAACCCTGCTGCTGC 
CCCGCTCGAG-GCCGCCTTTGCGGC 


BamHI-Ndel 
Xhol 


ORF 84 


Forward 
Reverse 


GCGGATCCCATATG-GCAGAGATCTGTTTG 
CCCGCTCGAG-GTTTGCCGATCCGACCA 


BamHI-Ndel 
Xhol 


ORF 85 


Forward 
Reverse 


CGCGGATCCCATATG - GCGGTTTGGGGCGGA 
CCCGCTCGAG-TCGGCGCGGCGGGC 


BamHI-Ndel 
Xhol 


ORF 89 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG-CCATACCTTCTTATCA 

CGGGATCC-GCCATACCTTCTTATCAGAG 

CCCGCTCGAG-TTTTTTGCGATTAGAAAAAGC 


Ndel-Ncol 

BamHI 

Xhol 


ORF 97 


Forward 


GCGGATCCCATATG-CATCCTGCCAGCGAAC 


BamHI-Ndel 



WO 99/24578 



-490- 



PCT/IB98/01665 





Reverse 


CCCGCTCGAC— TTCGCCTACGGTITTTTG 


Xhol 


ORF98 


Forward 
Reverse 


GCGGATCCCATATG-ACGGTAACTGCGG 
CCCGCTCGAG-TTGTTGTTCGGGCAAATC 


BamHI-Ndel 
Xhol 


ORF100 


Forward 
Reverse 


GCGGATCCCATATG-TCGGGCATTTACACCG 
CCCGCTCGAG-ACGGGTTTCGGCGGAA 


BamHI-Ndel 
Xhol 


ORF101 


Forward 
Reverse 


GCGGATCCCATATG-ATTTATCAAAGAAACCTC 
CCCGCTCGAG-TTTTCCGCCTTTCAATGT 


BamHI-Ndel 
Xhol 


ORF102 


Forward 
Reverse 


GCGGATCCCATATG-GCAGGGCTGTTTTACC 
CCCGCTCGAG-AAACGGTTTGAACACGAC 


BamHI-Ndel 
Xhol 


ORF103 


Forward 
Reverse 


GCGGATCCCATATG - AAC CACG AC AT C AC 
CCCGCTCGAG-CAGCCACAGGACGGC 


BamHI-Ndel 
Xhol 


ORF104 


Forward 
Reverse 


GCGGATCCCATATG-ACGTGGGGAACGC 
CCCGCTCGAG-GCGGCGTTTGAACGGC 


BamHI-Ndel 
Xhol 


ORF105 


Forward 
Reverse 


GCGGATCCCATATG-ACCAAATTTCAAACCCCTC 
CCCGCTCGAG-TAAACGAATGCCGTCCAG 


BamHI-Ndel 
Xhol 


ORF106 


Forward 
Reverse 


GCGGATCCCATATG-AGGATAACCGACGGCG 
CCCGCTCGAG-TTTGTTCCCGATGATGTT 


BamHI-Ndel 
Xhol 


ORF109 


Forward 
Reverse 


GCGGATCC CATATG- GAAGAT T TAT AT AT AAT AC T C G 
CCCGCTCGAG-ATCAGCTTCGAACCGAAG 


BamHI-Ndel 
Xhol 


ORF110 


Forward 
Reverse 


AAAGAATTC-ATGAGTAAATCCCGTAGATCTCCC 
AAACTGCAG-GGAAAACCACATCCGCACTCTGCC 


EcoRI 
PstI 


ORF111 


Forward 
Reverse 


AAAGAATTC-GCACCGCAAAAGGCAAAAACCGCA 
AAACTGCAG-TCTGCGCGTTTTCGGGCAGGGTGG 


EcoRI 
PstI 


ORF113 


Forward 
Reverse 


AAAGAATTC-ATGAACAAAACCCTCTATCGTGTGATTTTCAACCG 
AAACTGCAG-TTACGAATGCCTGCTTGCTCGACCGTACTG 


EcoRI 
PstI 


ORF115 


Forward 
Reverse 


AAAGAATTC-TTGCTTGTGCAAACAGAAAAAGACGG 
AAAAAAGTCGAC-CTATTTTTTAGGGGC 2TTTGC TTGTTTGAAAAGCCTGCC 


EcoRI 
Sail 


ORF119 


Forward 


AAAGAATTC-TACAACATGTATCAGGAAAACCAATACCG 
AAACTGCAG-TTATGAAAACAGGCGCAGGGCGGTTTTGCC 


EcoRI 
PstI 


ORF120 


Forward 
Reverse 


AAAGAATTC-GCAAGGCTACCCCAATCCGCCGTG 
AAACTGCAG-CGGTTTGGCTGCCTGGCCGTTGAT 


EcoRI 
PstI 


ORF121 


Forward 
Reverse 


AAAGAATTC-GCCTTGGTCTGGCTGGTTTTCGC 
1 AAACTGCAG-TCATCCGCCACCCCACCTCGGCCATCCATC 


EcoRI 
PstI 
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ORF122 


Forward 
Reverse 


AAAAAAGTCGAC-ATGTC TTACCGCGCAAGCAGTTC TCC 
AAAC T G C AG - T C AGGAAC AC AAACG AT GACGAAT AT C CG T AT C 


Sail 
PstI 


ORF125 


Forward 
Reverse 


AAAGAATTC-GCGCTGTTTTTTGCGGCGGCGTAT 
AAACTGCAG-CGCCGTTTCAAGACGAAAAAGTCG 


EcoRI 
PstI 


ORF126 


Forward 
Reverse 


AAAGAATTC-GCGGAAACGGTCGAAG 
AAACTGCAG-TTAATCTTGTCTTCCGATATAC 


EcoRI 
PstI 


ORF127 


Forward 
Reverse 


AAAGAATTC-ATGACTGATAATCGGGGGTTTACG 
AAAAAAGTCGAC-CTTAAGTAACTTGCAGTCCTTATC 


EcoRI 
Sail 


ORF128 


Forward 
Reverse 


AAAGAATTC-ATGCAAGCTGTCCGCTACAGGCC 
AAACTGCAG-CTAITGCAATGCGCCGCCGCGGGARTGrTTGAGCAGGCG 


EcoRI 
PstI 


ORF129 


Forward 
Reverse 


AAAGAATTC-ATGGATTTTCGTTTTGACATTATTTACGAATACCG 
AAACTGCAG-TTATTTTTTGATGAAATTTTGGGGCGG 


EcoRI 
PstI 


ORF130 


Forward 
Reverse 


AAAG AAT T C - GCAG T ACT T G C CAT TCTCGGTGCG 
AAACTGCAG-CTCCGGATCGTCTGTAAACGCATT 


EcoRI 
PstI 


ORF131 


Forward 
Reverse 


GCGGATCCCATATG-GAAATTCGGGCAATAAAAT 
CCCGCTCGAG-CCAGCGGACGCGTTC 


BamHI-Ndel 
Xhol 


ORF 132 


Forward 
Reverse 


GCGGATCCCATATG-AAAGAAGCGGGGTTTG 
CCCGCTCGAG-CCAATCTGCCAGCCGT 


BamHI-Ndel 
Xhol 


ORF133 


Forward 
Reverse 


CGCGGATCCCATATG-GAAGATGCAGGGCGCG 
CCCGCTCGAG-AAACTTGTAGCTCATCGT 


BamHI-Ndel 
Xhol 


ORF 134 


Forward 
Reverse 


GCGGATCCCATATG-TCTGTGCAAGCAGTATTG 
CCCGCTCGAG-ATCCTGTGCCAATGCG 


BamHI-Ndel 
Xhol 


ORF 135 


Forward 
Reverse 


GCGGATCCCATATG-CCGTCTGAAAAAGCTTT 
CCCGCTCGAG-AAATACCGCTGAGGATG 


BamHI-Ndel 
Xhol 


ORF 136 


Forward 
Reverse 


CGCGGATCCGCTAGC-ATGAAGCGGCGTATAGCC 
CCCGCTCGAG-TTCCGAATATTTGGAACTTTT 


BamHI-Nhel 
Xhol 


ORF 137 


Forward 
Reverse 


CGCGGATCCCATATG-GGCACGGCGGGAAATA 
CCCGCTCGAG-ATAACGGTATGCCGCC 


BamHI-Ndel 
Xhol 


ORF 138 


Forward 
Reverse 


GCGGATCCCATATG-TTTCGTTTACAATTCAGGC 
CCCGCTCGAG-CGGCGTTTTATAGCGG 


BamHI-Ndel 
Xhol 


ORF 139 


Forward 
Reverse 


GCGGATCCCATATG-GCTTTTTTGGCGGTAATG 
CCCGCTCGAG-TAACGTTTCCGTGCGTTT 


BamHI-Ndel 
Xhol 
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ORF140 


Forward 
Reverse 


GCGGATCCCATATG-TTGCCCACAGGCAGC 
CCCGCTCGAG-GACGATGGCAAACAGC 


BamHI-Ndel 
Xhol 


ORF141 


Forward 
Reverse 


GCGGATCCCATATG-CCGTCTGAAGCAGTCT 
CCCGCTCGAG-ATCTGTTGTTTTTAAAATATT 


BamHI-Ndel 
Xhol 


ORF142 


Forward 
Reverse 


GCGGATCCCATATG-GATAATTCTGGTAGTGAAG 
CCCGCTCGAG-AAACGTATAGCCTACCT 


BamHI-Ndel 
Xhol 


ORF143 


Forward 
Reverse 


GCGGATCCCATATG-GATACCGCTTTGAACCT 
CCCGCTCGAG-AATGGCTTCCGCAATATG 


BamHI-Ndel 
Xhol 


ORF144 


Forward 
Reverse 


GCGGATCCCATATG-ACCTTTTTACAACGTTTGC 
CCCGCTCGAG-AGATTGTTGTTGTTTTTTCG 


BamHI-Ndel 
Xhol 


ORF147 


Forward 
Reverse 


GCGGATCCCATATG-TCTGTCTTTCAAACGGC 
CCCGCTCGAG-TTTGTTTTTGCAAGACAG 


BamHI-Ndel 
Xhol 



NB: 

- restriction sites are underlined 



- for ORFs 1 10-130, where the ORF itself carries an EcoRl site {eg. ORF122), a Sail site 
was used in the forward primer instead. Similarly, where the ORF carries a Pstl site (eg. 
5 ORFs 1 1 5 and 127), a Sail site was used in the reverse primer. 
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TABLE II - Summary of cloning, expression and purification 



ORF 


PCR/cloning 


His-fusion 
expression 


GST-fusion 
expression 


Purification 


orf 1 


+ 


+ 


+ 


His-fusion 


orf 2 


+ 


+ 


+ 


GST-fusion 


orf 2.1 


+ 


n.d. 


+ 


GST-fusion 


orf 4 


+ 


+ 


+ 


His-fusion 


orf 5 


+ 


n.d. 


+ 


GST-fusion 


orf 6 


+ 


+ 


+ 


GST-fusion 


orf 7 


+ 


+ 


+ 


GST-fusion 


orf 8 


+ 


n.d. 


n.d. 




orf 9 


+ 


+ 


+ 


GST-fusion 


orf 10 


+ 


n.d. 


n.d. 




orf 11 


+ 


n.d. 


n.d. 




orf 13 


+ 


n.d. 


+ 


GST-fusion 


orf 15 


+ 


+ 


+ 


GST-fusion 


orf 17 


+ 


n.d. 


n.d. 




orf 18 


+ 


n.d. 


n.d. 




orf 19 


+ 


n.d. 


n.d. 




orf 20 


+ 


n.d. 


n.d. 




orf 22 


+ 


+ 


+ 


GST-fusion 


orf 23 


+ 


+ 


+ 


His-fusion 


orf 24 


+ 


n.d. 


n.d. 




orf 25 


+ 


+ 


+ 


His-fusion 


orf 26 


+ 


n.d. 


n.d. 




orf 27 


+ 


+ 


+ 


GST-fusion 


orf 28 


+ 


+ 


+ 


GST-fusion 


orf 29 


+ 


n.d. 


n.d. 




orf 32 


+ 


+ 


+ 


His-fusion 


orf 33 


+ 


n.d. 


n.d. 




orf 35 


+ 


n.d. 


n.d. 




orf 37 


+ 


+ 


+ 


GST-fusion 


orf 58 




n.d. 


n.d. 




orf 65 


+ 


n.d. 


n.d. 




orf 66 


+ 


n.d. 


n.d. 




orf 72 


+ 


+ 


n.d. 


His-fusion 


orf 73 


+ 


n.d. 


+ 


n.d. 


orf 75 


+ 


n.d. 


n.d. 




orf 76 


+ 


+ 


n.d. 


His-fusion 


orf 79 


+ 


+ 


n.d. 


His-fusion 


orf 83 


+ 


n.d. 


+ 


n.d. 


orf 84 


+ 


n.d. 


n.d. 
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orf 85 


+ 


n.d. 


+ 


GST-fusion 


orf 89 


+ 


n.<± 


+ 


GST-fusion 


orf 97 


+ 


+ 


+ 


GST-fusion 


orf 98 


+ 


n.d. 


n.d. 




orf 100 


+ 


n.d. 


n.d. 




orf 101 


+ 


n.d. 


n.d. 




orf 102 


+ 


n.d. 


n.d. 




orf 103 


+ 


n.d. 


n.d. 




orf 104 


+ 


n.d. 


n.d. 




orf 105 


+ 


n.d. 


n.d. 




orf 106 


+ 


+ 


+ 


His-fusion 


orf 109 


+ 


n.d. 


n.d. 




orf 110 


+ 


n.d. 


n.d. 




orf 111 


+ 


+ 


n.d. 


His-fusion 


orf 113 


+ 


+ 


n.d. 


His-fusion 


orf 115 


n.d. 


n.d. 


n.d. 




orf 119 


+ 


+ 


n.d. 


His-fusion 


orf 120 


+ 


+ 


n.d. 


His-fusion 


orf 121 


+ 


n.d. 


n.d. 




orf 122 


+ 


+ 


n.d. 


His-fusion 


orf 125 


+ 


+ 


n.d. 


His-fusion 


orf 126 


+ 


+ 


n.d. 


His-fusion 


orf 127 


+ 


+ 


n.d. 


His-fusion 


orf 128 


+ 


n.d. 


n.d. 




orf 129 


+ 


+ 


n.d. 


His-fusion 


orf 130 


+ 


n.d. 


n.d. 




orf 131 


+ 


+ 


+ 


n.d. 


orf 132 


+ 


+ 


+ 


His-fusion 


orf 133 


+ 


n.d. 


+ 


GST-fusion 


orf 134 


+ 


n.d. 


n.d. 




orf 135 


+ 


n.d. 


n.d. 




orf 136 


+ 


n.d. 


n.d. 




orf 137 


+ 


n.d. 


+ 


GST-fusion 


orf 138 


+ 


n.d. 


+ 


GST-fusion 


orf 139 


+ 


n.d. 


n.d. 




orf 140 










orf 141 


+ 


n.d. 


n.d. 




orf 142 


+ 


n.d. 


n.d. 




orf 143 


+ 


n.d. 


n.d. 




orf 144 


+ 


n.d. 


+ 


n.d. 


orf 147 


+ 


n.d. 


n.d. 
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CLAIMS 

1 . A protein comprising an amino acid sequence selected from the group consisting of SEQ 
IDs 2, 4, 6, and 8. 

2. A nucleic acid molecule which encodes a protein according to claim 1 . 

5 3. A nucleic acid molecule according to claim 2, comprising a nucleotide sequence selected 
from the group consisting of SEQ IDs 1, 3, 5, and 7. 

4. A protein comprising an amino acid sequence selected from the group consisting of SEQ 
IDs 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 
54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, 

10 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 
144, H6, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 182, 
184, 186, 188, 190, 192, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 
224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, 258, 260, 262, 
264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 

15 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 336, 338, 340, 342, 
344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376, 378, 380, 382, 
384, 386, 388, 390, 392, 394, 396, 398, 400, 402, 404, 406, 408, 410, 412, 414, 416, 418, 420, 422, 
424, 426, 428, 430, 432, 434, 436, 438, 440, 442, 444, 446, 448, 450, 452, 454, 456, 458, 460, 462, 
464, 466, 468, 470, 472, 474, 476, 478, 480, 482, 484, 486, 488, 490, 492, 494, 496, 498, 500, 502, 

20 504, 506, 508, 510, 512, 514, 516, 518, 520, 522, 524, 526, 528, 530, 532, 534, 536, 538, 540, 542, 
544, 546, 548, 550, 552, 554, 556, 558, 560, 562, 564, 566, 568, 570, 572, 574, 576, 578, 580, 582, 
584, 586, 588, 590, 592, 594, 596, 598, 600, 602, 604, 606, 608, 610, 612, 614, 616, 618, 620, 622, 
624, 626, 628, 630, 632, 634, 636, 638, 640, 642, 644, 646, 648, 650, 652, 654, 656, 658, 660, 662, 
664, 666, 668, 670, 672, 674, 676, 678, 680, 682, 684, 686, 688, 690, 692, 694, 696, 698, 700, 702, 

25 704, 706, 708, 710, 712, 714, 716, 718, 720, 722, 724, 726, 728, 730, 732, 734, 736, 738, 740, 742, 
744, 746, 748, 750, 752, 754, 756, 758, 760, 762, 764, 766, 768, 770, 772, 774, 776, 778, 780, 782, 
784, 786, 788, 790, 792, 794, 796, 798, 800, 802, 804, 806, 808, 810, 812, 814, 816, 818, 820, 822, 
824, 826, 828, 830, 832, 834, 836, 838, 840, 842, 844, 846, 848, 850, 852, 854, 856, 858, 860, 862, 
864, 866, 868, 870, 872, 874, 876, 878, 880, 882, 884, 886, 888, 890, & 892.. 



30 5 . A protein having 50% or greater sequence identity to a protein according to claim 4. 
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6. A protein comprising a fragment of an amino acid sequence selected from the group 
consisting of SEQ IDs 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 
44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 
96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 

5 136, 138, 140, 142, 144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 
176, 178, 180, 182, 184, 186, 188, 190, 192, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 
216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 
256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 
296, 298, 300, 302, 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 

10 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 
376, 378, 380, 382, 384, 386, 388, 390, 392, 394, 396, 398, 400, 402, 404, 406, 408, 410, 412, 414, 
416, 418, 420, 422, 424, 426, 428, 430, 432, 434, 436, 438, 440, 442, 444, 446, 448, 450, 452, 454, 
456, 458, 460, 462, 464, 466, 468, 470, 472, 474, 476, 478, 480, 482, 484, 486, 488, 490, 492, 494, 
496, 498, 500, 502, 504, 506, 508, 510, 512, 514, 516, 518, 520, 522, 524, 526, 528, 530, 532, 534, 

1 5 536, 538, 540, 542, 544, 546, 548, 550, 552, 554, 556, 558, 560, 562, 564, 566, 568, 570, 572, 574, 
576, 578, 580, 582, 584, 586, 588, 590, 592, 594, 596, 598, 600, 602, 604, 606, 608, 610, 612, 614, 
616, 618, 620, 622, 624, 626, 628, 630, 632, 634, 636, 638, 640, 642, 644, 646, 648, 650, 652, 654, 
656, 658, 660, 662, 664, 666, 668, 670, 672, 674, 676, 678, 680, 682, 684, 686, 688, 690, 692, 694, 
696, 698, 700, 702, 704, 706, 708, 710, 712, 714, 716, 718, 720, 722, 724, 726, 728, 730, 732, 734, 

20 736, 738, 740, 742, 744, 746, 748, 750, 752, 754, 756, 758, 760, 762, 764, 766, 768, 770, 772, 774, 
776, 778, 780, 782, 784, 786, 788, 790, 792, 794, 796, 798, 800, 802, 804, 806, 808, 810, 812, 814, 
816, 818, 820, 822, 824, 826, 828, 830, 832, 834, 836, 838, 840, 842, 844, 846, 848, 850, 852, 854, 
856, 858, 860, 862, 864, 866, 868, 870, 872, 874, 876, 878, 880, 882, 884, 886, 888, 890, & 892.. 

7. An antibody which binds to a protein according to any one of claims 4 to 6. 

25 8. A nucleic acid molecule which encodes a protein according to any one of claims 4 to 6. 

9. A nucleic acid molecule according to claim 8, comprising a nucleotide sequence selected 
from the group consisting of SEQ IDs 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 
37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 
89, 91, 93, 95, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129, 
30 131, 133, 135, 137, 139, 141, 143, 145, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165, 167, 169, 
171, 173, 175, 177, 179, 181, 183, 185, 187, 189, 191, 193, 195, 197, 199, 201, 203, 205, 207, 209, 
211, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 
251, 253, 255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 
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291, 293, 295, 297, 299, 301, 303, 305, 307, 309, 311, 313, 315, 317, 319, 321, 323, 325, 327, 329, 
331, 333, 335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359, 361, 363, 365, 367, 369, 
371, 373, 375, 377, 379, 381, 383, 385, 387, 389, 391, 393, 395, 397, 399, 401, 403, 405, 407, 409, 
411, 413, 415, 417, 419, 421, 423, 425, 427, 429, 431, 433, 435, 437, 439, 441, 443, 445, 447, 449, 
5 451, 453, 455, 457, 459, 461, 463, 465, 467, 469, 471, 473, 475, 477, 479, 481, 483, 485, 487, 489, 
491, 493, 495, 497, 499, 501, 503, 505, 507, 509, 511, 513, 515, 517, 519, 521, 523, 525, 527, 529, 
531, 533, 535, 537, 539, 541, 543, 545, 547, 549, 551, 553, 555, 557, 559, 561, 563, 565, 567, 569, 
571, 573, 575, 577, 579, 581, 583, 585, 587, 589, 591, 593, 595, 597, 599, 601, 603, 605, 607, 609, 
611, 613, 615, 617, 619, 621, 623, 625, 627, 629, 631, 633, 635, 637, 639, 641, 643, 645, 647, 649, 

10 651, 653, 655, 657, 659, 661, 663, 665, 667, 669, 671, 673, 675, 677, 679, 681, 683, 685, 687, 689, 
691, 693, 695, 697, 699, 701, 703, 705, 707, 709, 711, 713, 715, 717, 719, 721, 723, 725, 727, 729, 
731, 733, 735, 737, 739, 741, 743, 745, 747, 749, 751, 753, 755, 757, 759, 761, 763, 765, 767, 769, 
771, 773, 775, 777, 779, 781, 783, 785, 787, 789, 791, 793, 795, 797, 799, 801, 803, 805, 807, 809, 
811, 813, 815, 817, 819, 821, 823, 825, 827, 829, 831, 833, 835, 837, 839, 841, 843, 845, 847, 849, 

15 851, 853, 855, 857, 859, 861, 863, 865, 867, 869, 871, 873, 875, 877, 879, 881, 883, 885, 887, 889, 
«& 891.. 

10. A nucleic acid molecule comprising a fragment of a nucleotide sequence selected from the 
group consisting of SEQ IDs 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 
41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 

20 93, 95, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129, 131, 133, 
135, 137, 139, 141, 143, 145, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 
175, 177, 179, 181, 183, 185, 187, 189, 191, 193, 195, 197, 199, 201, 203, 205, 207, 209, 211, 213, 
215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, 253, 
255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 

25 295, 297, 299, 301, 303, 305, 307, 309, 311, 313, 315, 317, 319, 321, 323, 325, 327, 329, 331, 333, 
335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 
375, 377, 379, 381, 383, 385, 387, 389, 391, 393, 395, 397, 399, 401, 403, 405, 407, 409, 411, 413, 
415, 417, 419, 421, 423, 425, 427, 429, 431, 433, 435, 437, 439, 441, 443, 445, 447, 449, 451, 453, 
455, 457, 459, 461, 463, 465, 467, 469, 471, 473, 475, 477, 479, 481, 483, 485, 487, 489, 491, 493, 

30 495, 497, 499, 501, 503, 505, 507, 509, 511, 513, 515, 517, 519, 521, 523, 525, 527, 529, 531, 533, 
535, 537, 539, 541, 543, 545, 547, 549, 551, 553, 555, 557, 559, 561, 563, 565, 567, 569, 571, 573, 
575, 577, 579, 581, 583, 585, 587, 589, 591, 593, 595, 597, 599, 601, 603, 605, 607, 609, 611, 613, 
615, 617, 619, 621, 623, 625, 627, 629, 631, 633, 635, 637, 639, 641, 643, 645, 647, 649, 651, 653, 
655, 657, 659, 661, 663, 665, 667, 669, 671, 673, 675, 677, 679, 681, 683, 685, 687, 689, 691, 693, 

35 695, 697, 699, 701, 703, 705, 707, 709, 711, 713, 715, 717, 719, 721, 723, 725, 727, 729, 731, 733, 
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735, 737, 739, 741, 743, 745, 747, 749, 751, 753, 755, 757, 759, 761, 763, 765, 767, 769, 771, 773, 
775, 777, 779, 781, 783, 785, 787, 789, 791, 793, 795, 797, 799, 801, 803, 805, 807, 809, 811, 813, 
815, 817, 819, 821, 823, 825, 827, 829, 831, 833, 835, 837, 839, 841, 843, 845, 847, 849, 851, 853, 
855, 857, 859, 861, 863, 865, 867, 869, 871, 873, 875, 877, 879, 881, 883, 885, 887, 889, & 891.. 

5 11. A nucleic acid molecule comprising a nucleotide sequence complementary to a nucleic acid 
molecule according to any one of claims 8 to 10. 

12. A nucleic acid molecule comprising a nucleotide sequences having 50% or greater sequence 
identity to a nucleic acid molecule according to any one of claims 8-11. 

13. A nucleic acid molecule which can hybridise to a nucleic acid molecule according to any 
10 one of claims 8-12 under high stringency conditions. 

14. A composition comprising a protein, a nucleic acid molecule, or an antibody according to 
any preceding claim. 

15. A composition according to claim 14 being a vaccine composition or a diagnostic 
composition. 

15 16. A composition according to claim 14 or claim 15 for use as a pharmaceutical. 

17. The use of a composition according to claim 1 4 in the manufacture of a medicament for the 
treatment or prevention of infection due to Neisserial bacteria. 
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FIGURE 2 
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Fig. 5 A 
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FIGURE 6 
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FIGURE 11 
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Fig. 13A 
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FIGURE 14 
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FIGURE 16 
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